Methods Of Diagnosing Or Treating Prostate Cancer Using The ERG Gene, Alone Or In Combination With Other Over Or Under Expressed Genes In Prostate Cancer

ABSTRACT

The present invention relates to oncogenes or tumor suppressor genes, as well as other genes, involved in prostate cancer and their expression products, as well as derivatives and analogs thereof. Provided are therapeutic compositions and methods of detecting and treating cancer, including prostate and other related cancers. Also provided are methods of diagnosing and/or prognosing prostate cancer by determining the expression level of at least one prostate cancer-cell-specific gene, including, for example, the ERG gene or the LTF gene alone, or in combination with at least one of the AMACR gene and the DD3 gene.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of U.S. provisionalapplication Ser. No. 60/568,822, filed May 7, 2004, and Ser. No.60/622,021, filed Oct. 27, 2004, the entire disclosures of which arerelied upon and incorporated by reference.

GOVERNMENT INTEREST

The invention described herein may be manufactured, licensed, and usedfor governmental purposes without payment of royalties thereon.

FIELD OF THE INVENTION

The present invention relates to oncogenes, tumor suppressor genes, aswell as other genes, and their expression products, involved in prostatecancer, as well as derivatives and analogs thereof. The inventionfurther relates to therapeutic compositions and methods of detecting,diagnosing, and treating cancer, including prostate and other relatedcancers.

BACKGROUND OF THE INVENTION

Prostate cancer (CaP) is the most common malignancy in American men andsecond leading cause of cancer mortality (Landis et al. (1999) Cancer J.Clin., 49:8-31; Jemal et al. (2004) Cancer J Clin 54:8-29). Themolecular determinants in the development and progression of thisdisease are poorly understood. In recent years, there have beenintensive investigations of molecular genetics of the CaP. To date,however, oncogene, tumor suppressor gene, or other gene alterationscommon to most CaPs have not been found. Alterations of tumorsuppressors such as p53, PTEN and p27, or oncogenes such as BCL2, HER2and C-MYC associate with only small subsets of primary CaP, with morefrequent association observed in advanced CaP.

Current clinical parameters, including serum Prostate Specific Antigen(PSA), tumor stage, and Gleason score are routinely used as risk factorsat the time of diagnosis, but have limited application to identifypatients at a greater risk for developing aggressive CaP. Approximately30-40% of patients treated with radical prostatectomy for localized CaPhave been found to have microscopic disease that is not organ-confinedand a significant portion of these patients relapse. (Singh et al.,Cancer Cell (2000) 1:203-209; Henshell et at., Can. Res. (2003) 63:4196-4203). Therefore, discovery of novel biomarkers or gene expressionpatterns defining CaP onset and progression is crucial in predictingpatients with greater risk to develop aggressive CaP.

CaP-specific genetic alterations have been the subject of intensiveresearch by several investigations in the past five years (Srikatan etal., In Prostate Cancer, Diagnosis and Surgical Treatment (2002)Springer-Verlag, 25-40; Karan et al., Int. J. Can. (2003) 103:285-293;Augustus et al., In Molecular Pathology of Early Cancer (1999) IOSpress: 321-340; Moul et al., Clin Prostate Cancer (2002) 1:42-50; Lalaniet al., Cancer and Mets Rev (1997) 16: 29-66; Issacs et al., EpidemiolRev (2001) 23:36-41; Ozen et al., Anticancer Res (2000) 20:1905-1912;Morton et al., J Natl Med Assoc (1998) 90:S728-731). Promising leadsboth in biology and translational research areas are beginning to emergefrom recent genomics and proteomics technology, as well as traditionalapproaches. However, the inherent heterogeneity of CaP has hampered themolecular characterization of CaP.

One of the challenges in studying molecular alterations in humancancers, including prostate tumors, is to define the relativecontributions of genetic alterations in epithelial and non-epithelialcomponents of the target organ in the process of tumorigeneis. Despiteadvances in technology, changes in human CaP-specific epithelial andstromal cell-associated gene expression are still not well understood.

Despite recent advances in the identification of molecular alterationsassociated with certain prostate cancers, the heterogeneous nature ofprostate tissue has hindered the identification of genetic targetscommon to all, or at least the vast majority of, prostate cancers. Thecomplexity and heterogeneity of prostate cancer has also hindered theidentification of targets that allow differentiation between clinicallyaggressive and non-aggressive cancers at the time of diagnosis.Therefore, there remains a need to identify molecular alterationsspecific for a pathologically defined cell population that can provideimportant clues for optimal diagnosis and prognosis, and help toestablish individualized treatments tailored to the molecular profile ofthe tumor.

Citation of references herein shall not be construed as an admissionthat such references are prior art to the present invention.

SUMMARY OF THE INVENTION

It is one of the objects of the present invention to provide methods andkits for detecting cancer, in particular prostate cancer. These methodsand kits can be used to detect (either qualitatively or quantitatively)nucleic acids or proteins that serve as cancer markers. For example, theexpression of the prostate cancer-cell-specific gene ERG, when detectedin a biological sample from a subject, either alone or in combinationwith other cancer markers, including the expression of other prostatecancer-cell-specific genes, can be used to indicate the presence ofprostate cancer in the subject or a higher predisposition of the subjectto develop prostate cancer. Detecting ERG expression, alone or incombination with the expression of any gene identified in Tables 1-6,can thus be used to diagnose or prognose cancer, particularly prostatecancer.

According to one aspect of the invention, the method for detecting theexpression of one or more prostate cancer cell-specific genes, such asERG, AMACR, and LTF or the DD3 gene, in a biological sample, comprises:

(a) combining a biological sample with at least a first and a secondoligonucleotide primer under hybridizing conditions, wherein the firstoligonucleotide primer contains a sequence that hybridizes to a firstsequence in a target sequence from a prostate cancer cell-specific gene,such as ERG (SEQ ID NO:1), AMACR (SEQ ID NO:3), and/or LTF (SEQ ID NO:5)and/or DD3 (SEQ ID NO:4), and the second oligonucleotide primer containsa sequence that hybridizes to a second sequence in a nucleic acid strandcomplementary to the target sequence, wherein the first sequence doesnot overlap with the second sequence,

(b) adding at least one polymerase activity to produce a plurality ofamplification products when the target sequence is present in thebiological sample,

(c) adding an oligonucleotide probe that hybridizes to at least oneamplification product of the target sequence, and

(d) detecting whether a signal results from hybridization between theoligonucleotide probe and the at least one amplification product,wherein detection of the signal indicates the expression of a prostatecancer cell-specific gene in the biological sample.

The method preferably comprises detecting the expression of thefollowing combinations of genes: 1) ERG and AMACR; 2) ERG and DD3; and3) ERG, AMACR and DD3. In another embodiment, the method comprisesdetecting LTF and one or more of ERG, AMACR and DD3. Expression of thesegenes can also be detected by measuring ERG, AMACR or LTF polypeptidesin the biological sample.

The biological sample is preferably a prostate tissue, blood, or urinesample. Detecting a signal resulting from hybridization between theoligonucleotide probe and the at least one amplification product can beused to diagnose or prognose cancer, particularly prostate cancer.

The oligonucleotide probe may be optionally fixed to a solid support.When detecting ERG expression in a biological sample, theoligonucleotide probe, first oligonucleotide primer, and secondoligonucleotide primer, each comprise a nucleic acid sequence that iscapable of hybridizing under defined conditions (preferably under highstringency hybridization conditions, e.g., hybridization for 48 hours at65° C. in 6×SSC followed by a wash in 0.1×SSX at 50° C. for 45 minutes)to SEQ ID NO:1. Thus, the oligonucleotide probe, first oligonucleotideprimer, and second oligonucleotide primer can include, for example, SEQID NO:1 itself, or a fragment thereof or a sequence complementarythereto. Preferably the oligonucleotide probe, first oligonucleotideprimer, or second oligonucleotide primer is a fragment of SEQ ID NO:1having at least about 15, at least about 20, or at least about 50contiguous nucleotides of SEQ ID NO:1 or a sequence complementarythereto. When detecting ERG expression, the target sequence ispreferably a fragment of SEQ ID NO:1. Probes, primers, and targetsequences can be similarly derived from other genes of interest, such asDD3 (SEQ ID NO:4), and other prostate cancer-cell-specific genes,including, for example, AMACR (SEQ ID NO:3) and LTF (SEQ ID NO:5).

In another aspect of the invention, the method of diagnosing orprognosing prostate cancer comprises:

measuring the expression level (e.g. mRNA or polypeptide) of an overexpressed prostate cancer cell-specific gene, such as ERG and/or AMACR,and/or the DD3 gene in a biological sample, and

correlating the expression level of the ERG, AMACR, and/or DD3 gene withthe presence of prostate cancer or a higher predispo-sition to developprostate cancer in the subject.

In a related aspect of the invention, the method of diagnosing orprognosing prostate cancer comprises:

measuring the expression level (e.g. mRNA or polypeptide) of an underexpressed prostate cancer cell-specific gene, such as LTF in abiological sample, and

correlating the expression level of the LTF gene with the presence ofprostate cancer or a higher predisposition to develop prostate cancer inthe subject.

The skilled artisan will understand how to correlate expression levelsor patterns of the desired genes with the presence of prostate cancer ora higher predisposition to develop prostate cancer. For example, theexpression levels can be quantified such that increased or decreasedexpression levels relative to a control sample or other standardizedvalue or numerical range indicate the presence of prostate cancer or ahigher predisposition to develop prostate cancer.

The increased or decreased expression levels in the methods of theinvention may be measured relative to the expression level of theprostate cancer cell-specific gene or polypeptide in normal, matchedtissue, such as benign prostate epithelial cells from the same subject.Alternatively, the expression level of a gene or polypeptide may bemeasured relative to the expression of the gene or polypeptide in othernoncancerous samples from the subject or in samples obtained from adifferent subject without cancer. Expression of a gene may also benormalized by comparing it to the expression of other cancer-specificmarkers. For example, in prostate cancer, a prostate-cell specificmarker, such as PSA, can be used as a control to compare and/ornormalize expression levels of other genes, such as ERG, LTF, DD3,and/or AMACR. By way of example, the method of diagnosing or prognosingprostate cancer comprises measuring the expression levels of the ERG,DD3 and/or AMACR gene and diagnosing or prognosing prostate cancer,where an increased expression level of the ERG, DD3, and/or AMACR geneof at least two times as compared to the control sample indicates thepresence of prostate cancer or a higher predisposition in the subject todevelop prostate cancer. Conversely, by way of example, in such a methodof diagnosing or prognosing prostate cancer, a decreased expression ofthe LTF gene of at least two times as compared to the control sampleindicates the presence of prostate cancer or a higher predisposition inthe subject to develop prostate cancer.

The expression levels of prostate cancer cell-specific genes (e.g., mRNAor polypeptide expression) can be detected according to the methodsdescribed herein or using any other known detection methods, including,without limitation, immunohistochemistry, Southern blotting, Northernblotting, Western blotting, ELISA, and nucleic acid amplificationprocedures, including but not limited to PCR, transcription-mediatedamplification (TMA), nucleic acid sequence-based amplification (NASBA),self-sustained sequence replication (3SR), ligase chain reaction (LCR),strand displacement amplification (SDA), and Loop-Mediated IsothermalAmplification (LAMP).

It is yet another object of the present invention to provide a method ofdetermining a gene expression pattern in a biological sample, where thepattern can be correlated with the presence or absence of tumor cells,particularly prostate tumor cells. For example, ERG is detected incombination with other prostate cancer cell-specific genes (identifiedin Tables 1-6), including AMACR and/or LTF, to obtain expressionprofiles from biological samples. The expression profiles of theseprostate cancer-cell-specific genes are useful for detecing cancer,particularly prostate cancer. ERG can also be detected in combinationwith DD3, with or without other prostate cancer cell-specific genes,such as AMACR and/or LTF, to obtain expression profiles from biologicalsamples. These expression profiles are also useful for detecting cancer,particularly prostate cancer. Increased levels of ERG, AMACR, and/or DD3in a biological sample indicate the presence of prostate cancer or ahigher predisposition in the subject to develop prostate cancer.Decreased levels of LTF in a biological sample indicate the presence ofprostate cancer or a higher predisposition in the subject to developprostate cancer.

It is yet another object of the present invention to provide a method ofdetermining a gene expression pattern in a biological sample, where thepattern can be used to indicate or predict the pathologic stage ofcancer, particularly prostate cancer. For example, the gene expressionpattern can be used to indicate or predict a moderate risk prostatecancer or a high risk prostate cancer or to predict whether the prostatecancer is progressing or regressing or in remission. The gene expressionpattern can also be used as a prognostic indictor of disease-freesurvival following radical prostatectomy. In a particular embodiment,gene expression patterns are derived from the expression level of theERG gene, alone or in combination with other prostatecancer-cell-specific genes (identified in Tables 1-6), including AMACRand LTF, or DD3.

Kits for detecting cancer, particularly prostate cancer, are alsoprovided. These kits comprise a nucleic acid probe, such as the onesdescribed herein, that hybridizes to a prostate cancer-cell-specificgene. In one embodiment the nucleic acid probe hybridizes to SEQ ID NO:1(ERG) or the complement thereof under defined hybridization conditions(preferably under high stringency hybridization conditions, e.g.,hybridization for 48 hours at 65° C. in 6×SSC followed by a wash in0.1×SSX at 50° C. for 45 minutes) and includes SEQ ID NO:1, itself, or afragment of SEQ ID NO:1 having at least about 15, at least about 20, orat least about 50 contiguous nucleotides of SEQ ID NO:1 or a sequencecomplementary thereto. In a particular embodiment, the probe selectivelyhybridizes to the ERG1 and ERG2 isoforms but not to ERG isoforms 3-9. Inanother embodiment, the probe selectively hybridizes to the ERG1 isoformbut not to ERG isoforms 2-9. The nucleic acid probe may be optionallyfixed to a solid support.

The kit may also contain at least one additional nucleic acid probe thathybridizes (preferably high stringency hybridization conditions, e.g.,hybridization for 48 hours at 65° C. in 6×SSC followed by a wash in0.1×SSX at 50° C. for 45 minutes) to DD3 (SEQ ID NO:4) or a geneidentified in Tables 1-6, including for example, AMACR (SEQ ID NO:3) orLTF (SEQ ID NO:5). In one embodiment, the kit comprises a firstoligonucloetide probe capable of hybridizing to SEQ ID NO:1 (ERG) or asequence complimentary thereto under conditions of high stringency andat least one other oligonucleotide probe capable of hybridizing to SEQID NO:3 (AMACR) or a sequence complimentary thereto, or to SEQ ID NO:4(DD3) or a sequence complementary thereto, or to a gene identified inTables 1-6 under conditions of high stringency. In a related embodiment,the kit having an ERG and AMACR probe further comprises a thirdoligonucleotide probe capable of hybridzing to SEQ ID NO:4 (DD3) or asequence complementary thereto. The kits described herein may optionallycontain an oligonucleotide probe capable of hybridizing to SEQ ID NO:5(LTF) or a sequence complementary thereto under conditions of highstringency.

The kits may further comprise a first oligonucleotide primer and asecond oligonucleotide primer, where the first oligonucleotide primercontains a sequence that hybridizes to a first sequence in SEQ ID NO:1,and the second oligonucleotide primer contains a sequence thathybridizes to a second sequence in a nucleic acid strand complementaryto SEQ ID NO:1, wherein the first sequence does not overlap with thesecond sequence. The first and second oligonucleotide primers arecapable of amplifying a target sequence of interest in SEQ ID NO:1.Similarly, the kits can further comprise first and secondoligonucleotide primers derived from DD3 (SEQ ID NO:4) or a prostatecancer-cell-specific gene, including, for example AMACR (SEQ ID NO:3) orLTF (SEQ ID NO:5).

It is another object of the invention to provide therapeutic methods oftreating cancer, in particular prostate cancer.

It is yet another object of the present invention to provide screeningmethods for identifying compounds that modulate expression of aCaP-cell-specific gene, such as ERG, in prostate cancer cells.

The present invention is based in part on the identification of geneexpression signatures that correlate with a high risk of CaPprogression. Over expression or under expression of specific genes arepredictive of tumor progression. The invention provides genes, such asthe ERG gene, and analogs of specific genes that can be used alone or incombination with DD3 or other CaP-cell-specific genes, such as AMACR orLTF, to function as diagnostic and prognostic targets for cancer,particularly prostate tumors. The invention further provides genes, suchas the ERG gene, and analogs of specific genes that can be used alone orin combination as therapeutic targets for cancer, in particular prostatetumors.

The invention further discloses diagnostic kits comprised of ananti-CaP-cell-specific gene antibody, for example, an anti-ERG geneantibody, which is optionally, detectably labeled. A kit is alsoprovided that comprises nucleic acid primer sequences and/or a nucleicacid probe capable of hybridizing under defined conditions (preferablyhigh stringency hybridization conditions, e.g., hybridization for 48hours at 65° C. in 6×SSC followed by a wash in 0.1×SSX at 50° C. for 45minutes) to an ERG nucleic acid. The kits may also contain an anti-DD3gene antibody or a second anti-CaP-cell-specific gene antibody, such asan anti-AMACR or anti-LTF gene antibody, or a second set of nucleic acidprimer sequences and/or a nucleic acid probe capable of hybridizingunder defined conditions to the DD3 gene or another CaP-cell-specificgene, such as the AMACR or LTF gene.

The disclosed CaP-cell-specific genes, such as ERG, can be used alone orin combination as biomarkers of cancer, and in particular, prostatecancers and other related diseases, as targets for therapeuticintervention, or as gene therapy agents.

The invention provides for treatment of disorders of hyperproliferation(e.g., cancer, benign tumors) by administering compounds that modulateexpression of the specific genes.

Methods of screening cancer cells, and in particular, prostate cancercells, for specific gene expression signatures, including ERG geneexpression signatures, alone or in combination with DD3 gene expressionsignatures or other CaP-cell-specific gene expression signatures, suchas AMACR or LTF, are provided.

Additional objects of the invention will be set forth in part in thedescription following, and in part will be understood from thedescription, or may be learned by practice of the invention.

It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory onlyand are not restrictive of the invention as claimed.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1: Relative expression level of ERG (A), AMACR (B), GSTP1 (C), andLTF (D) in matched tumor and benign prostate epithelial cells analyzedby QRT-PCR (TaqMan) X-axis: CaP patients analyzed (1-20); Y-axis:Expression ratio between tumor versus benign laser capturemicrodissection (LCM) sample pairs.

FIG. 2: Identification of genes by a distance based MDS and weightedanalysis that discriminates between cancerous and benign tissue. (A)Two-dimensional MDS plot elucidating discrimination of 18 tumor samplesand 18 benign samples. (B) Hierarchical clustering dendrogram with twomajor clusters of 18 tumor samples in the right cluster and 18 benignsamples in the left cluster.

FIG. 3: A distance based MDS and weighted gene analysis using the tumorover benign ratio (or fold change) data for the identification of genesthat can discriminate between high risk CaP and moderate risk CaP. (A) Asupervised MDS analysis of 18 samples (9 samples from high risk groupand 9 samples from moderate risk group) that ranks the genes accordingto their impact on minimizing cluster volume and maximizingcenter-to-center inter cluster distance. (B) Hierarchical clustering ofthe first 55 genes of the top 200 obtained by the MDS analysis. Genesand samples are arranged as ordered by cluster and treeview. Expressionof each gene in each sample is obtained by the tumor over benign ratioor fold change (T/N). Dendrogram at the top of the cluster shows twomajor clusters, 9 samples of the MR groups in the right cluster and 9samples of the HR groups in the left cluster. (C) Two-dimensional MDSplot of 18 CaP tumor epithelia that shows the differentiation betweenthe high risk group (9 tumor epithelia) and moderate risk group (9 tumorepithelia) on the basis of the impact of the rank of the genes thatdiscriminate between the HR and MR groups. (D) Hierarchical clusteringdendrogram with two major clusters of 9 samples of the MR groups in theleft cluster and 8+1 samples of the HR groups in the right cluster. (E)Two-dimensional MDS plot of 18 CaP benign epithelia that shows thediscrimination between the high risk group (9 benign epithelia) andmoderate risk group (9 benign epithelia) samples.

FIG. 4: In silico validation: the discriminatory potential of the genesthat we obtained from our supervised MDS analysis on two independentdata sets (Welsh et al. 2001, Singh et al. 2002). Two-dimensional MDSplot that shows the discrimination between 7 tumor epithelia of the highrisk group and 7 tumor epithelia of the moderate risk group using datafrom Welsh et al. (A), as well as discrimination between 4 tumorepithelia of the high risk group and 5 tumor epithelia of the moderaterisk group using data from Singh et al. (B).

FIG. 5: Combined gene expression analysis of ERG, AMACR and DD3 in tumorand benign prostate epithelial cells of 55 CaP patients. The graphsrepresent patient distribution by tumor versus benign gene expressionratios according to five gene expression categories: 1) “Up:” greaterthan 2 fold over expression in tumor compared to benign; 2) “Down:” lessthan 0.5 fold under expression in tumor compared to benign; 3) “Same:”no significant difference (0.5 to 2 fold); 4) “No expr.:” no detectablegene expression; and 5) “Other:” collectively defines patients withexpression category 2, 3 and 4 for the indicated genes (i.e., other thancategory 1). (A) ERG Expression. (B) AMACR Expression. (C) DD3Expression. (D) ERG or AMACR Expression. (E) ERG or DD3 Expression. (F)ERG, AMACR, or DD3 Expression.

FIG. 6: Map of ERG1 and ERG2 isoforms with probe and primer locations.The numbered boxes represent exons, the darker boxes after exon 16 arethe 3′ non-coding exon regions. Translational start and stop codons areindicated by star and pound signs, respectively. The locations of theAffymetrix probe set (213541_s_at), the TaqMan probes, the traditionalRT-PCR primers, and the in situ hybridization probe are indicated.

FIG. 7: Correlation of ERG1 expression and PSA recurrence-free survival.Kaplan-Meier analysis of correlation with post-prostatectomy PSArecurrence-free survival was performed on 95 CaP patients havingdetectable levels of ERG1 mRNA by real time QRT-PCR (TaqMan).Kaplan-Meier survival curves were stratified by the following ERG1expression categories: 1) greater than 100 fold over expression; 2)2-100 fold over expression; and 3) less than 2 fold over expression orunder expression of ERG1 in the prostate tumor cells. The p value was0.0006.

FIG. 8. In situ hybridization images in 7 CaP patients were analysed bythe Open-Lab image analysis software (Improvisation, Lexington, Mass.)coupled to a microscope via a cooled digital camera (Leica Microsystems,Heidelburg, Germany). Density (OD) values for tumor (dark columns) andbenign (light columns) epithelium are shown on the Y axis, and patients1-7 are shown in the X axis. Patient No. 7 was added as a control withno significant ERG1 expression difference between tumor and benign cellsby QRT-PCR (TaqMan). Statistical analysis was performed with the SPSSsoftware package.

FIG. 9. ERG1 is represented as a modular structure. The two conservedregions namely SAM-PNT Domain (Protein/RNA interaction domain) and ETSDomain (Interaction with DNA) are shaded.

DETAILED DESCRIPTION OF THE INVENTION Definitions

The term “CaP-cell-specific gene,” or “prostate cancer-cell-specificgene,” refers to a gene identified in Tables 1-6. The definition furtherencompasses CaP-cell-specific gene analogs, e.g., orthologues andhomologues, and functionally equivalent fragments of CaP-cell-specificgenes or their analogs, the expression of which is either upregulated ordownregulated in prostate cancer cells.

The term “CaP-cell-specific gene expression signature” refers to thepattern of upregulation or downregulation of product expression asmeasured by the Affymetrix GeneChip assay described in Example 1, theQRT-PCR assay described in Example 2, or any other quantitativeexpression assay known in the art.

The term “ERG” refers to the ERG gene or ERG cDNA or mRNA describedherein, and includes ERG isoforms, such as ERG1 and ERG2. The cDNAsequence of the ERG1 gene is published in GenBank under the accessionnumber M21535. The cDNA sequence of the ERG2 gene is published inGenBank under the accession number M17254.

The term “AMACR” refers to the AMACR gene or AMACR cDNA or mRNAdescribed herein, and includes AMACR isoforms. The cDNA sequence of theAMACR gene is published in GenBank under the accession numberNM_(—)014324.

The term “DD3” refers to the DD3 gene or DD3 cDNA or mRNA describedherein, and includes DD3 isoforms. The cDNA sequence of the DD3 gene ispublished in GenBank under the accession number AF 103907 and is alsodisclosed in WO 98/45420 (1998). Although DD3 was originally used todescribe a fragment of exon 4 of the prostate cancer antigen 3 (PCA3)gene, the term, as used in herein, is not so limited. DD3 is intended torefer to the entire DD3 gene or cDNA or mRNA, which in the art is alsocommonly referred to as PCA3.

The term “LTF” refers to the LTF gene or LTF cDNA or mRNA describedherein and includes LTF isoforms. The cDNA sequence of the LTF gene ispublished in GenBank under the accession number NM_(—)002343.

The term “polypeptide” is used interchangeably with the terms “peptide”and “protein” and refers to any chain of amino acids, regardless oflength or posttranslational modification (e.g., glycosylation orphosphorylation), or source (e.g., species).

The phrase “substantially identical,” or “substantially as set out,”means that a relevant sequence is at least 70%, 75%, 80%, 85%, 90%, 95%,97, 98, or 99% identical to a given sequence. By way of example, suchsequences may be allelic variants, sequences derived from variousspecies, or they may be derived from the given sequence by truncation,deletion, amino acid substitution or addition. For polypeptides, thelength of comparison sequences will generally be at least 20, 30, 50,100 or more amino acids. For nucleic acids, the length of comparisonsequences will generally be at least 50, 100, 150, 300, or morenucleotides. Percent identity between two sequences is determined bystandard alignment algorithms such as, for example, Basic LocalAlignment Tool (BLAST) described in Altschul et al. (1990) J. Mol.Biol., 215:403-410, the algorithm of Needleman et al. (1970) J. Mol.Biol., 48:444-453, or the algorithm of Meyers et al. (1988) Comput.Appl. Biosci., 4:11-17.

The terms “specific interaction,” “specific binding,” or the like, meanthat two molecules form a complex that is relatively stable underphysiologic conditions. The term is also applicable where, e.g., anantigen-binding domain is specific for a particular epitope, which iscarried by a number of antigens, in which case the specific bindingmember carrying the antigen-binding domain will be able to bind to thevarious antigens carrying the epitope. Specific binding is characterizedby a high affinity and a low to moderate capacity. Nonspecific bindingusually has a low affinity with a moderate to high capacity. Typically,the binding is considered specific when the affinity constant K_(a) ishigher than 10⁶M⁻¹, more preferably higher than 10⁷M⁻¹, and mostpreferably 10⁸M⁻¹. If necessary, non-specific binding can be reducedwithout substantially affecting specific binding by varying the bindingconditions. Such conditions are known in the art, and a skilled artisanusing routine techniques can select appropriate conditions. Theconditions are usually defined in terms of concentration of antibodies,ionic strength of the solution, temperature, time allowed for binding,concentration of non-related molecules (e.g., serum albumin, milkcasein), etc. The term “detectably labeled” refers to any means formarking and identifying the presence of a molecule, e.g., anoligonucleotide probe or primer, a gene or fragment thereof, or a cDNAmolecule. Methods for labeling a molecule are well known in the art andinclude, without limitation, radioactive labeling (e.g., with an isotopesuch as ³²P, ³⁵S, or ¹²⁵I) and nonradioactive labeling (e.g.,fluorescent and chemiluminescent labeling).

The term “modulatory compound” is used interchangeably with the term“therapeutic” as used herein means any compound capable of “modulating”either CaP-cell-specific gene expression at the transcriptional,translational, or post-translational levels or modulating the biologicalactivity of a CaP-cell-specific polypeptide. The term “modulate” and itscognates refer to the capability of a compound acting as either anagonist or an antagonist of a certain reaction or activity. The termmodulate, therefore, encompasses the terms “activate” and “inhibit.” Theterm “activate,” for example, refers to an increase in the expression ofthe CaP-cell-specific gene or activity of a CaP-cell-specificpolypeptide in the presence of a modulatory compound, relative to theactivity of the gene or the polypeptide in the absence of the samecompound. The increase in the expression level or the activity ispreferably at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%,or higher. Analogously, the term “inhibit” refers to a decrease in theexpression of the CaP-cell-specific gene or activity of aCaP-cell-specific polypeptide in the presence of a modulatory compound,relative to the activity of the gene or the polypeptide in the absenceof the same compound. The decrease in the expression level or theactivity is preferably at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%,80%, 90%, or higher. The expression level of the CaP-cell-specific geneor activity of a CaP-cell-specific polypeptide can be measured asdescribed herein or by techniques generally known in the art.

The term “treatment” is used interchangeably herein with the term“therapeutic method” and refers to both therapeutic treatment andprophylactic/preventative measures. Those in need of treatment mayinclude individuals already having a particular medical disorder as wellas those who may ultimately acquire the disorder.

The term “isolated” refers to a molecule that is substantially free ofits natural environment. Any amount of that molecule elevated over thenaturally occurring levels due to any manipulation, e.g., overexpression, partial purification, etc., is encompassed with thedefinition. With regard to partially purified compositions only, theterm refers to an isolated compound that is at least 50-70%, 70-90%,90-95% (w/w), or more pure.

The term “effective dose,” or “effective amount,” refers to that amountof the compound that results in amelioration of symptoms in a patient ora desired biological outcome, e.g., inhibition of cell proliferation.The effective amount can be determined as described in the subsequentsections.

The terms “polynucleotide,” “oligonucleotide,” “nucleic acid,” and “DNA”are used interchangeably herein and refer to deoxyribonucleic acid(DNA), and, where appropriate, ribonucleic acid (RNA). The term shouldalso be understood to include nucleotide analogs, and single or doublestranded polynucleotides. Examples of polynucleotides include, but arenot limited to, plasmid DNA or fragments thereof, viral DNA or RNA,anti-sense RNA, etc. The term “plasmid DNA” refers to double strandedDNA that is circular.

As used herein the term “hybridization under defined conditions,” or“hybridizing under defined conditions,” is intended to describeconditions for hybridization and washes under which nucleotide sequencesthat are significantly identical or homologous to each other remainbound to each other. The conditions are such that sequences, which areat least about 6 and more preferably at least about 20, 50, 100, 150,300, or more nucleotides long and at least about 70%, more preferably atleast about 80%, even more preferably at least about 85-90% identical,remain bound to each other. The percent identity can be determined asdescribed in Altschul et al. (1997) Nucleic Acids Res., 25: 3389-3402.

Appropriate hybridization conditions can be selected by those skilled inthe art with minimal experimentation as exemplified in Ausubel et al.(2004), Current Protocols in Molecular Biology, John Wiley & Sons.Additionally, stringent conditions are described in Sambrook et al.(2001) Molecular Cloning: A Laboratory Manual, 3^(rd) ed., Cold SpringHarbor Laboratory Press. A nonlimiting example of defined conditions oflow stringency is as follows. Filters containing DNA are pretreated for6 hours at 40° C. in a solution containing 35% formamide, 5×SSC, 50 mMTris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP, 0.1% Ficoll, 1% BSA, and 500μg/ml denatured salmon sperm DNA. Hybridizations are carried out in thesame solution with the following modifications: 0.02% PVP, 0.02% Ficoll,0.2% BSA, 100 μg/ml salmon sperm DNA, 10% (wt/vol) dextran sulfate, and5−20×10⁶ cpm ³²P-labeled probe is used. Filters are incubated inhybridization mixture for 18-20 hours at 40° C., and then washed for 1.5hours at 55° C. in a solution containing 2×SSC, 25 mM Tris-HCl (pH 7.4),5 mM EDTA, and 0.1% SDS. The wash solution is replaced with freshsolution and incubated an additional 1.5 hours at 60° C. Filters areblotted dry and exposed for autoradiography. Other conditions of lowstringency well known in the art may be used (e.g., as employed forcross-species hybridizations).

A non-limiting example of defined conditions of high stringency is asfollows. Prehybridization of filters containing DNA is carried out for 8hours to overnight at 65° C. in buffer composed of 6×SSC, 50 mM Tris-HCl(pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 500 μg/mldenatured salmon sperm DNA. Filters are hybridized for 48 hours at 65°C. in the prehybridization mixture containing 100 μg/ml denatured salmonsperm DNA and 5−20×10⁶ cpm of ³²P-labeled probe. Washing of filters isdone at 37° C. for 1 hour in a solution containing 2×SSC, 0.01% PVP,0.01% Ficoll, and 0.01% BSA. This is followed by a wash in 0.1×SSC at50° C. for 45 minutes. Other conditions of high stringency well known inthe art may be used. An oligonucleotide hybridizes specifically to atarget sequence under high stringency conditions.

The term “solid support” means a material that is essentially insolubleunder the solvent and temperature conditions of the assay method,comprising free chemical groups available for joining an oligonucleotideor nucleic acid. Preferably, the solid support is covalently coupled toan oligonucleotide designed to directly or indirectly bind a targetnucleic acid. When the target nucleic acid is an mRNA, theoligonucleotide attached to the solid support is preferably a poly-Tsequence. A preferred solid support is a particle, such as a micron- orsubmicron-sized bead or sphere. A variety of solid support materials arecontemplated, such as, for example, silica, polyacrylate,polyacrylamide, a metal, polystyrene, latex, nitrocellulose,polypropylene, nylon or combinations thereof. More preferably, the solidsupport is capable of being attracted to a location by means of amagnetic field, such as a solid support having a magnetite core.Particularly preferred supports are monodisperse magnetic spheres (i.e.,uniform size.+−.about 5%).

The term “detecting” or “detection” means any of a variety of methodsfor determining the presence of a nucleic acid, such as, for example,hybridizing a labeled probe to a portion of the nucleic acid. A labeledprobe is an oligonucleotide that specifically binds to another sequenceand contains a detectable group which may be, for example, a fluorescentmoiety, a chemiluminescent moiety (such as an acridinium ester (AE)moiety that can be detected chemiluminescently under appropriateconditions (as described in U.S. Pat. No. 5,283,174)), a radioisotope,biotin, avidin, enzyme, enzyme substrate, or other reactive group. Otherwell know detection techniques include, for example, gel filtration, gelelectrophoresis and visualization of the amplicons, and High PerformanceLiquid Chromatography (HPLC). As used throughout the specification, theterm “detecting” or “detection” includes either qualitative orquantitative detection.

The term “primer” or “oligonculeotide primer” means an oligonucleotidecapable of binding to a region of a target nucleic acid or itscomplement and promoting nucleic acid amplification of the targetnucleic acid. Generally, a primer will have a free 3′ end that can beextended by a nucleic acid polymerase. Primers also generally include abase sequence capable of hybridizing via complementary base interactionseither directly with at least one strand of the target nucleic acid orwith a strand that is complementary to the target sequence. A primer maycomprise target-specific sequences and optionally other sequences thatare non-complementary to the target sequence. These non-complementarysequences may comprise a promoter sequence or a restriction endonucleaserecognition site.

CaP-Cell-Specific Gene Expression Signature Identification

The present invention is based in part on the identification andvalidation of consistent CaP epithelial cell specific gene expressionsignatures. These gene expression signatures define patients with CaPwho are at risk to develop advanced disease by identifying genes andpathways in prostate epithelial cells that differentiate betweenaggressive and non-aggressive courses of cancer development. Two patientgroups were selected, a high risk (HR) group having, for example, PSArecurrence, Gleason score 8-9, T3c stage, seminal vesicle invasion, poortumor differentiation, and a moderate risk (MR) group having, forexample, no PSA recurrence, Gleason score 6-7, T2a-T3b stage, no seminalvesicle invasion, well or moderate tumor differentiation. The twopatient groups were matched for known risk factors: age, race, andfamily history of CaP. LCM derived epithelial cells from tumor andnormal prostate of the two patient groups were compared by GeneChipanalyses, as described in the following Example 1. Results werevalidated using quantitative reverse transcriptase PCR (QRT-PCR), asdescribed in the following Example 2. The group of genes identified andvalidated as having the highest association with aggressive ornon-aggressive CaP can be used to reliably determine the likely courseof CaP progression.

Strikingly, one of the most consistently over expressed genes in CaPcells identified in this study was the ERG (ETS related gene) oncogene,a member of the ETS transcription factor family. (Oikawa et al., Gene(2003) 303:11-34; Sharrocks, A D, Nat Rev Mol Cell Biol (2001)2(11):827-37; Hart et al., Oncogene (1995) 10(7):1423-30; Rao et al.,Science (1987) 237(4815): 635-639). Two isoforms of ERG, ERG1 and ERG2,are over expressed with the highest frequency. The ERG1 coding sequence(with start and stop codons underlined) is publicly available throughGenBank under the accession number M21535, as follows:

(SEQ ID NO: 1)   1 gaattccctc caaagcaaga caaatgactc acagagaaaa aagatggcag aaccaagggc  61 aactaaagcc gtcaggttct gaacagctgg tagatgggct ggcttactga aggacatgat 121 tcagactgtc ccggacccag cagctcatat caaggaactc tcctgatgaa tgcagtgtgg 181 ccaaaggcgg gaagatggtg ggcagcccag acaccgttgg gatgaactac ggcagctaca 241 tggaggagaa gcacatgcca cccccaaaca tgaccacgaa cgagcgcaga gttatcgtgc 301 cagcagatcc tacgctatgg agtacagacc atgtgcggca gtggctggag tgggcggtga 361 aagaatatgg ccttccagac gtcaacatct tgttattcca gaacatcgat gggaaggaac 421 tgtgcaagat gaccaaggac gacttccaga ggctcacccc cagctacaac gccgacatcc 481 ttctctcaca tctccactac ctcagagaga ctcctcttcc acatttgact tcagatgatg 541 ttgataaagc cttacaaaac tctccacggt taatgcatgc tagaaacaca gatttaccat 601 atgagccccc caggagatca gcctggaccg gtcacggcca ccccacgccc cagtcgaaag 661 ctgctcaacc atctccttcc acagtgccca aaactgaaga ccagcgtcct cagttagatc 721 cttatcagat tcttggacca acaagtagcc gccttgcaaa tccaggcagt ggccagatcc 781 agctttggca gttcctcctg gagctcctgt cggacagctc caactccagc tgcatcacct 841 gggaaggcac caacggggag ttcaagatga cggatcccga cgaggtggcc cggcgctggg 901 gagagcggaa gagcaaaccc aacatgaact acgataagct cagccgcgcc ctccgttact 961 actatgacaa gaacatcatg accaaggtcc atgggaagcg ctacgcctac aagttcgact1021 tccacgggat cgcccaggcc ctccagcccc accccccgga gtcatctctg tacaagtacc1081 cctcagacct cccgtacatg ggctcctatc acgcccaccc acagaagatg aactttgtgg1141 cgccccaccc tccagccctc cccgtgacat cttccagttt ttttgctgcc ccaaacccat1201 actggaattc accaactggg ggtatatacc ccaacactag gctccccacc agccatatgc1261 cttctcatct gggcacttac tactaaagac ctggcggagg cttttcccat cagcgtgcat1321 tcaccagccc atcgccacaa actctatcgg agaacatgaa tcaaaagtgc ctcaagagga1381 atgaaaaaag ctttactggg gctggggaag gaagccgggg aagagatcca aagactcttg1441 ggagggagtt actgaagtct tactgaagtc ttactacaga aatgaggagg atgctaaaaa1501 tgtcacgaat atggacatat catctgtgga ctgaccttgt aaaagacagt gtatgtagaa1561 gcatgaagtc ttaaggacaa agtgccaaag aaagtggtct taagaaatgt ataaacttta1621 gagtagagtt tgaatcccac taatgcaaac tgggatgaaa ctaaagcaat agaaacaaca1681 cagttttgac ctaacatacc gtttataatg ccattttaag gaaaactacc tgtatttaaa1741 aatagtttca tatcaaaaac aagagaaaag acacgagaga gactgtggcc catcaacaga1801 cgttgatatg caactgcatg gcatgtgctg ttttggttga aatcaaatac attccgtttg1861 atggacagct gtcagctttc tcaaactgtg aagatgaccc aaagtttcca actcctttac1921 agtattaccg ggactatgaa ctaaaaggtg ggactgagga tgtgtataga gtgagcgtgt1981 gattgtagac agaggggtga agaaggagga ggaagaggca gagaaggagg agaccaggct2041 gggaaagaaa cttctcaagc aatgaagact ggactcagga catttgggga ctgtgtacaa2101 tgagttatgg agactcgagg gttcatgcag tcagtgttat accaaaccca gtgttaggag2161 aaaggacaca gcgtaatgga gaaagggaag tagtagaatt cagaaacaaa aatgcgcatc2221 tctttctttg tttgtcaaat gaaaatttta actggaattg tctgatattt aagagaaaca2281 ttcaggacct catcattatg tgggggcttt gttctccaca gggtcaggta agagatggcc2341 ttcttggctg ccacaatcag aaatcacgca ggcattttgg gtaggcggcc tccagttttc2401 ctttgagtcg cgaacgctgt gcgtttgtca gaatgaagta tacaagtcaa tgtttttccc2461 cctttttata taataattat ataacttatg catttataca ctacgagttg atctcggcca2521 gccaaagaca cacgacaaaa gagacaatcg atataatgtg gccttgaatt ttaactctgt2581 atgcttaatg tttacaatat gaagttatta gttcttagaa tgcagaatgt atgtaataaa2641 ataagcttgg cctagcatgg caaatcagat ttatacagga gtctgcattt gcactttttt2701 tagtgactaa agttgcttaa tgaaaacatg tgctgaatgt tgtggatttt gtgttataat2761 ttactttgtc caggaacttg tgcaagggag agccaaggaa ataggatgtt tggcacccaa2821 atggcgtcag cctctccagg tccttcttgc ctcccctcct gtcttttatt tctagcccct2881 tttggaacag gaaggacccc ggggtttcaa ttggagcctc catatttatg cctggaagga2941 aagaggccta tgaagctggg gttgtcattg agaaattcta gttcagcacc tggtcacaaa3001 tcacccttaa ttctgctatg attaaaatac atttgttgaa cagtgaacaa gctaccactc3061 gtaaggcaaa ctgtattatt actggcaaat aaagcgtcat ggatagctgc aatttctcac3121 tttaca

Nucleotides 195-1286 of SEQ ID NO:1 represent the coding sequence of SEQID NO:1.

The ERG2 coding sequence is publicly available through GenBank under theaccession number M17254, as follows (with start and stop codonsunderlined):

(SEQ ID NO: 2)   1 gtccgcgcgt gtccgcgccc gcgtgtgcca gcgcgcgtgc cttggccgtg cgcgccgagc  61 cgggtcgcac taactccctc ggcgccgacg gcggcgctaa cctctcggtt attccaggat 121 ctttggagac ccgaggaaag ccgtgttgac caaaagcaag acaaatgact cacagagaaa 181 aaagatggca gaaccaaggg caactaaagc cgtcaggttc tgaacagctg gtagatgggc 241 tggcttactg aaggacatga ttcagactgt cccggaccca gcagctcata tcaaggaagc 301 cttatcagtt gtgagtgagg accagtcgtt gtttgagtgt gcctacggaa cgccacacct 361 ggctaagaca gagatgaccg cgtcctcctc cagcgactat ggacagactt ccaagatgag 421 cccacgcgtc cctcagcagg attggctgtc tcaaccccca gccagggtca ccatcaaaat 481 ggaatgtaac cctagccagg tgaatggctc aaggaactct cctgatgaat gcagtgtggc 541 caaaggcggg aagatggtgg gcagcccaga caccgttggg atgaactacg gcagctacat 601 ggaggagaag cacatgccac ccccaaacat gaccacgaac gagcgcagag ttatcgtgcc 661 agcagatcct acgctatgga gtacagacca tgtgcggcag tggctggagt gggcggtgaa 721 agaatatggc cttccagacg tcaacatctt gttattccag aacatcgatg ggaaggaact 781 gtgcaagatg accaaggacg acttccagag gctcaccccc agctacaacg ccgacatcct 841 tctctcacat ctccactacc tcagagagac tcctcttcca catttgactt cagatgatgt 901 tgataaagcc ttacaaaact ctccacggtt aatgcatgct agaaacacag atttaccata 961 tgagcccccc aggagatcag cctggaccgg tcacggccac cccacgcccc agtcgaaagc1021 tgctcaacca tctccttcca cagtgcccaa aactgaagac cagcgtcctc agttagatcc1081 ttatcagatt cttggaccaa caagtagccg ccttgcaaat ccaggcagtg gccagatcca1141 gctttggcag ttcctcctgg agctcctgtc ggacagctcc aactccagct gcatcacctg1201 ggaaggcacc aacggggagt tcaagatgac ggatcccgac gaggtggccc ggcgctgggg1261 agagcggaag agcaaaccca acatgaacta cgataagctc agccgcgccc tccgttacta1321 ctatgacaag aacatcatga ccaaggtcca tgggaagcgc tacgcctaca agttcgactt1381 ccacgggatc gcccaggccc tccagcccca ccccccggag tcatctctgt acaagtaccc1441 ctcagacctc ccgtacatgg gctcctatca cgcccaccca cagaagatga actttgtggc1501 gccccaccct ccagccctcc ccgtgacatc ttccagtttt tttgctgccc caaacccata1561 ctggaattca ccaactgggg gtatataccc caacactagg ctccccacca gccatatgcc1621 ttctcatctg ggcacttact actaaagacc tggcggaggc ttttcccatc agcgtgcatt1681 caccagccca tcgccacaaa ctctatcgga gaacatgaat caaaagtgcc tcaagaggaa1741 tgaaaaaagc tttactgggg ctggggaagg aagccgggga agagatccaa agactcttgg1801 gagggagtta ctgaagtctt actacagaaa tgaggaggat gctaaaaatg tcacgaatat1861 ggacatatca tctgtggact gaccttgtaa aagacagtgt atgtagaagc atgaagtctt1921 aaggacaaag tgccaaagaa agtggtctta agaaatgtat aaactttaga gtagagtttg1981 aatcccacta atgcaaactg ggatgaaact aaagcaatag aaacaacaca gttttgacct2041 aacataccgt ttataatgcc attttaagga aaactacctg tatttaaaaa tagtttcata2101 tcaaaaacaa gagaaaagac acgagagaga ctgtggccca tcaacagacg ttgatatgca2161 actgcatggc atgtgctgtt ttggttgaaa tcaaatacat tccgtttgat ggacagctgt2221 cagctttctc aaactgtgaa gatgacccaa agtttccaac tcctttacag tattaccggg2281 actatgaact aaaaggtggg actgaggatg tgtatagagt gagcgtgtga ttgtagacag2341 aggggtgaag aaggaggagg aagaggcaga gaaggaggag accaggctgg gaaagaaact2401 tctcaagcaa tgaagactgg actcaggaca tttggggact gtgtacaatg agttatggag2461 actcgagggt tcatgcagtc agtgttatac caaacccagt gttaggagaa aggacacagc2521 gtaatggaga aagggaagta gtagaattca gaaacaaaaa tgcgcatctc tttctttgtt2581 tgtcaaatga aaattttaac tggaattgtc tgatatttaa gagaaacatt caggacctca2641 tcattatgtg ggggctttgt tctccacagg gtcaggtaag agatggcctt cttggctgcc2701 acaatcagaa atcacgcagg cattttgggt aggcggcctc cagttttcct ttgagtcgcg2761 aacgctgtgc gtttgtcaga atgaagtata caagtcaatg tttttccccc tttttatata2821 ataattatat aacttatgca tttatacact acgagttgat ctcggccagc caaagacaca2881 cgacaaaaga gacaatcgat ataatgtggc cttgaatttt aactctgtat gcttaatgtt2941 tacaatatga agttattagt tcttagaatg cagaatgtat gtaataaaat aagcttggcc3001 tagcatggca aatcagattt atacaggagt ctgcatttgc acttttttta gtgactaaag3061 ttgcttaatg aaaacatgtg ctgaatgttg tggattttgt gttataattt actttgtcca3121 ggaacttgtg caagggagag ccaaggaaat aggatgtttg gcaccc

Nucleotides 257-1645 of SEQ ID NO:2 represent the coding sequence of SEQID NO:2.

Validation by QRT-PCR (TaqMan) in microdissected tumor and benignprostate epithelial cells of 20 CaP patients confirmed a consistent,tumor associated over expression of ERG isoforms ERG1 and/or ERG2 in 95%of patients (19 of 20) (FIG. 1A). As a quality test and comparison, theexpression of AMACR, a recently identified CaP tissue marker (Rubin etal, JAMA (2002) 287:1662-1670; Luo et al., Cancer Res (2002) 62:2220-2226), and of GSTP1, a gene known to have decreased expression inCaP (Nelson et al., Ann N Y Acad Sci (2001) 952: 135-144), was alsodetermined (FIGS. 1B and 1C). Robust over expression in CaP cells of 95%of the patients, similarly to ERG, was observed for AMACR, while theGSTP1 expression was significantly decreased in the tumor cells of eachCaP patient, confirming the high quality of the tumor and benign LCMspecimens and the reliability of the QRT-PCR.

Recently a detailed mapping of the chromosomal region (21q22.2-q22.3)containing the ERG gene, as well as a complete exon-intron structurewith 9 alternative transcripts (or isoforms) has been described.(Owczarek et al., Gene (2004) 324: 65-77). The probes on the AffymetrixGeneChip used in our initial discovery of consistent ERG over expressionin CaP, as well as the TaqMan probe designed for the validationexperiment, recognize a region specific to the ERG 1 and 2 isoformsonly.

Both ERG and ETS are proto-oncogenes with mitogenic and transformingactivity. (Sharrocks, A D, Nat Rev Mol Cell Biol (2001) 2(11):827-37;Seth et al., Proc Natl Acad Sci USA (1989) 86:7833-7837). Deregulationor chromosomal reorganization of ERG is linked to Ewing sarcoma, myeloidleukemia and cervical carcinoma. (DeAlva et al., Int J Surg Pathol(2001) 9: 7-17; Simpson et al., Oncogene (1997) 14: 2149-2157; Shimizuet al., Proc Natl Acad Sci USA (1993) 90:10280-284; Papas, et al., Am JMed Genet Suppi. (1990) 7:251-261). ETS2 has been implicated in CaP, butit is over expressed only in a small proportion of CaP specimens. (Liuet al., Prostate (1997) 30:145-53; Semenchenko, et al., Oncogene (1998)17:2883-88). ERG over expression without amplification of DNA copynumber was recently reported in acute myeloid leukemia. (Balduc et al.,Proc. Natl. Acad. Sci. USA (2004) 101:3915-20). Gavrilov et al., Eur JCancer (2001) 37:1033-40 examined the expression of varioustranscription factors, including several proteins from the ETS family,in a very limited number of high-grade prostate cancer samples.Antibodies against the ETS family proteins, Elf-1 and Fli-1, causedintense staining of most of the high-grade prostate cancer samples. Incontrast, ERG protein, while being detected in the noncancerousendothelial cells (microvessels in the stroma) of most samples tested,was detected in only a minority of the high-grade prostate cancers. ETSfamily proteins have a variety of expression patterns in human tissues.(Oikawa et al., Gene (2003) 303:11-34). ERG is expressed in endothelialtissues, hematopoietic cells, kidney, and in the urogenital tract. ERGproteins are nuclear transcription factors that form homodimers, as wellas heterodimers with several other members of the ETS family oftranscription factors. (Carrere et al., Oncogene (1998) 16(25):3261-68). A negative crosstalk observed between ERG and estrogenreceptor (ER-alpha) may be relevant in urogenital tissues, where bothtranscription factors are expressed. (Vlaeminck-Guillem et al., Oncogene(2003) 22(50):8072-84). The present invention is based in part upon thesurprising discovery that ERG is over expressed in the majority of CaPspecimens analyzed, indicating that this oncogene plays a role inprostate tumorigenesis, most likely by modulating transcription oftarget genes favoring tumorigenesis in prostate epithelium.

The present invention is further based in part upon the over expressionof the AMACR gene in prostate cancer epithelium. The cDNA sequence ofthe AMACR is publicly available through GenBank under the accessionnumbers NM_(—)014324 and AF047020. The sequence (with start and stopcodons underlined) corresponding to accession number NM_(—)014324 is asfollows:

(SEQ ID NO: 3)   1 gggattggga gggcttcttg caggctgctg ggctggggct aagggctgct cagtttcctt  61 cagcggggca ctgggaagcg ccatggcact gcagggcatc tcggtcgtgg agctgtccgg 121 cctggccccg ggcccgttct gtgctatggt cctggctgac ttcggggcgc gtgtggtacg 181 cgtggaccgg cccggctccc gctacgacgt gagccgcttg ggccggggca agcgctcgct 241 agtgctggac ctgaagcagc cgcggggagc cgccgtgctg cggcgtctgt gcaagcggtc 301 ggatgtgctg ctggagccct tccgccgcgg tgtcatggag aaactccagc tgggcccaga 361 gattctgcag cgggaaaatc caaggcttat ttatgccagg ctgagtggat ttggccagtc 421 aggaagcttc tgccggttag ctggccacga tatcaactat ttggctttgt caggtgttct 481 ctcaaaaatt ggcagaagtg gtgagaatcc gtatgccccg ctgaatctcc tggctgactt 541 tgctggtggt ggccttatgt gtgcactggg cattataatg gctctttttg accgcacacg 601 cactggcaag ggtcaggtca ttgatgcaaa tatggtggaa ggaacagcat atttaagttc 661 ttttctgtgg aaaactcaga aattgagtct gtgggaagca cctcgaggac agaacatgtt 721 ggatggtgga gcacctttct atacgactta caggacagca gatggggaat tcatggctgt 781 tggagcaata gaaccccagt tctacgagct gctgatcaaa ggacttggac taaagtctga 841 tgaacttccc aatcagatga gcatggatga ttggccagaa atgaagaaga agtttgcaga 901 tgtatttgca gagaagacga aggcagagtg gtgtcaaatc tttgacggca cagatgcctg 961 tgtgactccg gttctgactt ttgaggaggt tgttcatcat gatcacaaca aggaacgggg1021 ctcgtttatc accagtgagg agcaggacgt gagcccccgc cctgcacctc tgctgttaaa1081 caccccagcc atcccttctt tcaaaaggga tcctttcata ggagaacaca ctgaggagat1141 acttgaagaa tttggattca gccgcgaaga gatttatcag cttaactcag ataaaatcat1201 tgaaagtaat aaggtaaaag ctagtctcta acttccaggc ccacggctca agtgaatttg1261 aatactgcat ttacagtgta gagtaacaca taacattgta tgcatggaaa catggaggaa1321 cagtattaca gtgtcctacc actctaatca agaaaagaat tacagactct gattctacag1381 tgatgattga attctaaaaa tggttatcat tagggctttt gatttataaa actttgggta1441 cttatactaa attatggtag ttattctgcc ttccagtttg cttgatatat ttgttgatat1501 taagattctt gacttatatt ttgaatgggt tctagtgaaa aaggaatgat atattcttga1561 agacatcgat atacatttat ttacactctt gattctacaa tgtagaaaat gaggaaatgc1621 cacaaattgt atggtgataa aagtcacgtg aaacagagtg attggttgca tccaggcctt1681 ttgtcttggt gttcatgatc tccctctaag cacattccaa actttagcaa cagttatcac1741 actttgtaat ttgcaaagaa aagtttcacc tgtattgaat cagaatgcct tcaactgaaa1801 aaaacatatc caaaataatg aggaaatgtg ttggctcact acgtagagtc cagagggaca1861 gtcagtttta gggttgcctg tatccagtaa ctcggggcct gtttccccgt gggtctctgg1921 gctgtcagct ttcctttctc catgtgtttg atttctcctc aggctggtag caagttctgg1981 atcttatacc caacacacag caacatccag aaataaagat ctcaggaccc cccagcaagt2041 cgttttgtgt ctccttggac tgagttaagt tacaagcctt tcttatacct gtctttgaca2101 aagaagacgg gattgtcttt acataaaacc agcctgctcc tggagcttcc ctggactcaa2161 cttcctaaag gcatgtgagg aaggggtaga ttccacaatc taatccgggt gccatcagag2221 tagagggagt agagaatgga tgttgggtag gccatcaata aggtccattc tgcgcagtat2281 ctcaactgcc gttcaacaat cgcaagagga aggtggagca ggtttcttca tcttacagtt2341 gagaaaacag agactcagaa gggcttctta gttcatgttt cccttagcgc ctcagtgatt2401 ttttcatggt ggcttaggcc aaaagaaata tctaaccatt caatttataa ataattaggt2461 ccccaacgaa ttaaatatta tgtcctacca acttattagc tgcttgaaaa atataataca2521 cataaataaa aaaa

Nucleotides 83-1231 of SEQ ID NO:3 represent the coding sequence ofAMACR.

The present invention is further based in part upon the over expressionof the DD3 gene in prostate cancer epithelium. The cDNA sequence of theDD3 gene is publicly available through GenBank under the accessionnumber AF103907. The sequence corresponding to accession number AF103907is as follows:

(SEQ ID NO: 4)   1 acagaagaaa tagcaagtgc cgagaagctg gcatcagaaa aacagagggg agatttgtgt  61 ggctgcagcc gagggagacc aggaagatct gcatggtggg aaggacctga tgatacagag 121 gaattacaac acatatactt agtgtttcaa tgaacaccaa gataaataag tgaagagcta 181 gtccgctgtg agtctcctca gtgacacagg gctggatcac catcgacggc actttctgag 241 tactcagtgc agcaaagaaa gactacagac atctcaatgg caggggtgag aaataagaaa 301 ggctgctgac tttaccatct gaggccacac atctgctgaa atggagataa ttaacatcac 361 tagaaacagc aagatgacaa tataatgtct aagtagtgac atgtttttgc acatttccag 421 cccctttaaa tatccacaca cacaggaagc acaaaaggaa gcacagagat ccctgggaga 481 aatgcccggc cgccatcttg ggtcatcgat gagcctcgcc ctgtgcctgg tcccgcttgt 541 gagggaagga cattagaaaa tgaattgatg tgttccttaa aggatgggca ggaaaacaga 601 tcctgttgtg gatatttatt tgaacgggat tacagatttg aaatgaagtc acaaagtgag 661 cattaccaat gagaggaaaa cagacgagaa aatcttgatg gcttcacaag acatgcaaca 721 aacaaaatgg aatactgtga tgacatgagg cagccaagct ggggaggaga taaccacggg 781 gcagagggtc aggattctgg ccctgctgcc taaactgtgc gttcataacc aaatcatttc 841 atatttctaa ccctcaaaac aaagctgttg taatatctga tctctacggt tccttctggg 901 cccaacattc tccatatatc cagccacact catttttaat atttagttcc cagatctgta 961 ctgtgacctt tctacactgt agaataacat tactcatttt gttcaaagac ccttcgtgtt1021 gctgcctaat atgtagctga ctgtttttcc taaggagtgt tctggcccag gggatctgtg1081 aacaggctgg gaagcatctc aagatctttc cagggttata cttactagca cacagcatga1141 tcattacgga gtgaattatc taatcaacat catcctcagt gtctttgccc atactgaaat1201 tcatttccca cttttgtgcc cattctcaag acctcaaaat gtcattccat taatatcaca1261 ggattaactt ttttttttaa cctggaagaa ttcaatgtta catgcagcta tgggaattta1321 attacatatt ttgttttcca gtgcaaagat gactaagtcc tttatccctc ccctttgttt1381 gatttttttt ccagtataaa gttaaaatgc ttagccttgt actgaggctg tatacagcac1441 agcctctccc catccctcca gccttatctg tcatcaccat caacccctcc cataccacct1501 aaacaaaatc taacttgtaa ttccttgaac atgtcaggac atacattatt ccttctgcct1561 gagaagctct tccttgtctc ttaaatctag aatgatgtaa agttttgaat aagttgacta1621 tcttacttca tgcaaagaag ggacacatat gagattcatc atcacatgag acagcaaata1681 ctaaaagtgt aatttgatta taagagttta gataaatata tgaaatgcaa gagccacaga1741 gggaatgttt atggggcacg tttgtaagcc tgggatgtga agcaaaggca gggaacctca1801 tagtatctta tataatatac ttcatttctc tatctctatc acaatatcca acaagctttt1861 cacagaattc atgcagtgca aatccccaaa ggtaaccttt atccatttca tggtgagtgc1921 gctttagaat tttggcaaat catactggtc acttatctca actttgagat gtgtttgtcc1981 ttgtagttaa ttgaaagaaa tagggcactc ttgtgagcca ctttagggtt cactcctggc2041 aataaagaat ttacaaagag ctactcagga ccagttgtta agagctctgt gtgtgtgtgt2101 gtgtgtgtgt gagtgtacat gccaaagtgt gcctctctct cttgacccat tatttcagac2161 ttaaaacaag catgttttca aatggcacta tgagctgcca atgatgtatc accaccatat2221 ctcattattc tccagtaaat gtgataataa tgtcatctgt taacataaaa aaagtttgac2281 ttcacaaaag cagctggaaa tggacaacca caatatgcat aaatctaact cctaccatca2341 gctacacact gcttgacata tattgttaga agcacctcgc atttgtgggt tctcttaagc2401 aaaatacttg cattaggtct cagctggggc tgtgcatcag gcggtttgag aaatattcaa2461 ttctcagcag aagccagaat ttgaattccc tcatctttta ggaatcattt accaggtttg2521 gagaggattc agacagctca ggtgctttca ctaatgtctc tgaacttctg tccctctttg2581 tgttcatgga tagtccaata aataatgtta tctttgaact gatgctcata ggagagaata2641 taagaactct gagtgatatc aacattaggg attcaaagaa atattagatt taagctcaca2701 ctggtcaaaa ggaaccaaga tacaaagaac tctgagctgt catcgtcccc atctctgtga2761 gccacaacca acagcaggac ccaacgcatg tctgagatcc ttaaatcaag gaaaccagtg2821 tcatgagttg aattctccta ttatggatgc tagcttctgg ccatctctgg ctctcctctt2881 gacacatatt agcttctagc ctttgcttcc acgactttta tcttttctcc aacacatcgc2941 ttaccaatcc tctctctgct ctgttgcttt ggacttcccc acaagaattt caacgactct3001 caagtctttt cttccatccc caccactaac ctgaatgcct agacccttat ttttattaat3061 ttccaataga tgctgcctat gggctatatt gctttagatg aacattagat atttaaagct3121 caagaggttc aaaatccaac tcattatctt ctctttcttt cacctccctg ctcctctccc3181 tatattactg attgcactga acagcatggt ccccaatgta gccatgcaaa tgagaaaccc3241 agtggctcct tgtggtacat gcatgcaaga ctgctgaagc cagaaggatg actgattacg3301 cctcatgggt ggaggggacc actcctgggc cttcgtgatt gtcaggagca agacctgaga3361 tgctccctgc cttcagtgtc ctctgcatct cccctttcta atgaagatcc atagaatttg3421 ctacatttga gaattccaat taggaactca catgttttat ctgccctatc aattttttaa3481 acttgctgaa aattaagttt tttcaaaatc tgtccttgta aattactttt tcttacagtg3541 tcttggcata ctatatcaac tttgattctt tgttacaact tttcttactc ttttatcacc3601 aaagtggctt ttattctctt tattattatt attttctttt actactatat tacgttgtta3661 ttattttgtt ctctatagta tcaatttatt tgatttagtt tcaatttatt tttattgctg3721 acttttaaaa taagtgattc ggggggtggg agaacagggg agggagagca ttaggacaaa3781 tacctaatgc atgtgggact taaaacctag atgatgggtt gataggtgca gcaaaccact3841 atggcacacg tatacctgtg taacaaacct acacattctg cacatgtatc ccagaacgta3901 aagtaaaatt taaaaaaaag tga

The DD3 gene appears to represent a non-coding nucleic acid. Therefore,no start and stop codons have been indicated.

The present invention is further based in part upon the under expressionof the LTF gene in prostate cancer epithelium. The cDNA sequence of thelactotransferrin (LTF) gene is publicly available through GenBank underthe accession number NM_(—)002343. The sequence (with start and stopcodons underlined) corresponding to accession number NM_(—)002343 is asfollows:

(SEQ ID NO: 5)   1 agagccttcg tttgccaagt cgcctccaga ccgcagacat gaaacttgtc ttcctcgtcc  61 tgctgttcct cggggccctc ggactgtgtc tggctggccg taggaggagt gttcagtggt 121 gcgccgtatc ccaacccgag gccacaaaat gcttccaatg gcaaaggaat atgagaaaag 181 tgcgtggccc tcctgtcagc tgcataaaga gagactcccc catccagtgt atccaggcca 241 ttgcggaaaa cagggccgat gctgtgaccc ttgatggtgg tttcatatac gaggcaggcc 301 tggcccccta caaactgcga cctgtagcgg cggaagtcta cgggaccgaa agacagccac 361 gaactcacta ttatgccgtg gctgtggtga agaagggcgg cagctttcag ctgaacgaac 421 tgcaaggtct gaagtcctgc cacacaggcc ttcgcaggac cgctggatgg aatgtcccta 481 tagggacact tcgtccattc ttgaattgga cgggtccacc tgagcccatt gaggcagctg 541 tggccaggtt cttctcagcc agctgtgttc ccggtgcaga taaaggacag ttccccaacc 601 tgtgtcgcct gtgtgcgggg acaggggaaa acaaatgtgc cttctcctcc caggaaccgt 661 acttcagcta ctctggtgcc ttcaagtgtc tgagagacgg ggctggagac gtggctttta 721 tcagagagag cacagtgttt gaggacctgt cagacgaggc tgaaagggac gagtatgagt 781 tactctgccc agacaacact cggaagccag tggacaagtt caaagactgc catctggccc 841 gggtcccttc tcatgccgtt gtggcacgaa gtgtgaatgg caaggaggat gccatctgga 901 atcttctccg ccaggcacag gaaaagtttg gaaaggacaa gtcaccgaaa ttccagctct 961 ttggctcccc tagtgggcag aaagatctgc tgttcaagga ctctgccatt gggttttcga1021 gggtgccccc gaggatagat tctgggctgt accttggctc cggctacttc actgccatcc1081 agaacttgag gaaaagtgag gaggaagtgg ctgcccggcg tgcgcgggtc gtgtggtgtg1141 cggtgggcga gcaggagctg cgcaagtgta accagtggag tggcttgagc gaaggcagcg1201 tgacctgctc ctcggcctcc accacagagg actgcatcgc cctggtgctg aaaggagaag1261 ctgatgccat gagtttggat ggaggatatg tgtacactgc aggcaaatgt ggtttggtgc1321 ctgtcctggc agagaactac aaatcccaac aaagcagtga ccctgatcct aactgtgtgg1381 atagacctgt ggaaggatat cttgctgtgg cggtggttag gagatcagac actagcctta1441 cctggaactc tgtgaaaggc aagaagtcct gccacaccgc cgtggacagg actgcaggct1501 ggaatatccc catgggcctg ctcttcaacc agacgggctc ctgcaaattt gatgaatatt1561 tcagtcaaag ctgtgcccct gggtctgacc cgagatctaa tctctgtgct ctgtgtattg1621 gcgacgagca gggtgagaat aagtgcgtgc ccaacagcaa cgagagatac tacggctaca1681 ctggggcttt ccggtgcctg gctgagaatg ctggagacgt tgcatttgtg aaagatgtca1741 ctgtcttgca gaacactgat ggaaataaca atgaggcatg ggctaaggat ttgaagctgg1801 cagactttgc gctgctgtgc ctcgatggca aacggaagcc tgtgactgag gctagaagct1861 gccatcttgc catggccccg aatcatgccg tggtgtctcg gatggataag gtggaacgcc1921 tgaaacaggt gttgctccac caacaggcta aatttgggag aaatggatct gactgcccgg1981 acaagttttg cttattccag tctgaaacca aaaaccttct gttcaatgac aacactgagt2041 gtctggccag actccatggc aaaacaacat atgaaaaata tttgggacca cagtatgtcg2101 caggcattac taatctgaaa aagtgctcaa cctcccccct cctggaagcc tgtgaattcc2161 tcaggaagta aaaccgaaga agatggccca gctccccaag aaagcctcag ccattcactg2221 cccccagctc ttctccccag gtgtgttggg gccttggcct cccctgctga aggtggggat2281 tgcccatcca tctgcttaca attccctgct gtcgtcttag caagaagtaa aatgagaaat2341 tttgttgata ttctctcctt aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa

Nucleotides 39-2171 of SEQ ID NO:5 represent the coding sequence of LTF.

LTF is a non-heme iron binding glycoprotein and a member of thetransferring gene family. Bowman et al., Adv. Genet. 25:1-38 (1988);Park et al., Proc. Natl. Acad. Sci. U.S.A., 82:3149-53 (1985). Theconcentration of LTF in human prostate is hormone dependent and itsexpression is regulated by estrogen. van Sande et al., Urol. Res.,9(5):241-44 (1981); Teng et al., Biochem. Cell Biol., 80:7-16 (2002);Teng et al., Mol. Human. Reproduction., 8, (1):58-67 (2002). LTF hasalso been implicated in certain cancers. For example, bovine LTFinhibits colon, esophagus, lung, and bladder carcinomas in rats. Tsudaet al., Biochem. Cell Biol., 80:131-136 (2002); Tsuda et al.,Biofactors., 12(1-4):83-8 (2000); Tsuda et al., Biofactors.,12(1-4):83-8 (2000); Tsuda et al., Mutat Res., 462(2-3):227-33 (2000).In a study published over 20 years ago, van Sande et al., Urol. Res.9:241-244 (1981), examined lactoferrin protein levels in human benignprostatic hypertrophy samples. They also detected low levels oflactoferrin protein in 3 carcinoma samples. However, we are the first toreport the consistent and significant under expression of LTF mRNA inprostate cancer epithelial cells from a large number of patient samples.The observed under expression of LTF mRNA in such a statisticallysignificant sample size indicates that under expression of LTF is auseful diagnostic marker for prostate cancer.

In one experiment, when screened using the Affymetrix GeneChip, CaPtumor cells exhibited upregulated AMACR expression in comparison tomatched benign cells. In this studied patient cohort (n=73), AMACR wasupregulated in tumor compared to matched benign prostate epithelium in89.04% of the patients (65 of 73), while ERG was upregulated in 78.08%(57 of 73). When these two markers were combined, we observed a 100% CaPdetection rate (under the criteria that at least one marker wasupregulated) in the studied patient cohort (73 of 73). These dataindicate that the combination of ERG and AMACR screening provides ahighly accurate tool for CaP detection.

In another experiment, 96.4% of patients showed upregulation of eitherthe ERG or AMACR gene in laser microdissected matched tumor and benignprostate epithelial cells from 55 CaP patients (FIG. 5). Similarly,96.4% of patients showed upregulation of either the ERG or DD3 gene(FIG. 5). When the expression data for the ERG, AMACR, and DD3 genes wascombined, 98.2% of the CaP patients showed upregulation of at least oneof the three genes in tumor cells (FIG. 5). Thus, the combination ofERG, AMACR, and DD3 screening also provides a highly accurate tool forCaP detection.

In yet another experiment, validation by QRT-PCR (TaqMan) inmicrodissected tumor and benign prostate epithelial cells of 20 CaPpatients confirmed a consistent, tumor associated under expression ofLTF in 100% of patients (20 of 20) (FIG. 1D). Further validation studiesby QRT-PCR in microdissected tumor and benign prostate epithelial cellsof 103 CaP patients were consistent with the initial results, showingtumor associated under expression in 76% of patients (78 of 103).

Diagnostic Uses

In one embodiment, the present invention comprises a method of CaPdiagnosis comprising screening biological samples for CaP-cell-specificgene expression signatures. In particular, the invention comprisesscreening for at least one of the CaP-cell-specific genes listed inTables 1-6, particularly the ERG gene, the AMACR gene, the LTF gene or acombination of the ERG gene and the AMACR genes. The invention alsocomprises methods of diagnosing CaP comprising screening biologicalsamples for expression of the ERG and DD3 genes, or a combination of theERG, DD3, and AMACR genes.

In a further embodiment, the present invention comprises a method of CaPdiagnosis comprising screening biological samples for CaP-cell-specificgene expression signatures using methods known in the art, including,for example, immunohistochemistry, ELISA, in situ RNA hybridization, andany oligonucleitde amplification procedure known or later developed,including PCR (including QRT-PCR), transcription-mediated amplification(TMA), nucleic acid sequence-based amplification (NASBA), self-sustainedsequence replication (3SR), ligase chain reaction (LCR), stranddisplacement amplification (SDA), and Loop-Mediated IsothermalAmplification (LAMP). See, e.g., Mullis, U.S. Pat. No. 4,683,202; Erlichet al., U.S. Pat. No. 6,197,563; Walker et al., Nucleic Acids Res.,20:1691-1696 (1992); Fahy et al., PCR Methods and Applications, 1:25-33(1991); Kacian et al., U.S. Pat. No. 5,399,491; Kacian et al., U.S. Pat.No. 5,480,784; Davey et al., U.S. Pat. No. 5,554,517; Birkenmeyer etal., U.S. Pat. No. 5,427,930; Marshall et al., U.S. Pat. No. 5,686,272;Walker, U.S. Pat. No. 5,712,124; Notomi et al., European PatentApplication No. 1 020 534 A1; Dattagupta et al., U.S. Pat. No.6,214,587; and HELEN H. LEE ET AL., NUCLEIC ACID AMPLIFICATIONTECHNOLOGIES: APPLICATION TO DISEASE DIAGNOSIS (1997). Each of theforegoing amplification references is hereby incorporated by referenceherein. In particular, the invention comprises generating antibodies toCaP-cell-specific genes, including ERG, AMACR, LTF, and DD3 for use in aimmunohistochemistry assay. Other known diagnostic assays may be used todetect gene expression.

In a specific embodiment, the present invention comprises a method ofdiagnosing CaP comprising screening biological samples for expression ofthe ERG and AMACR genes, the ERG and DD3 genes, or the ERG, AMACR, andDD3 genes, or the LTF gene using methods known in the art, including,for example, immunohistochemistry, ELISA, in situ hybridization, PCR(including QRT-PCR), transcription-mediated amplification (TMA), nucleicacid sequence-based amplification (NASBA), self-sustained sequencereplication (3SR), ligase chain reaction (LCR), strand displacementamplification (SDA), and Loop-Mediated Isothermal Amplification (LAMP).

ERG, LTF, or AMACR polypeptides, their fragments or other derivatives,or analogs thereof, may be used as immunogens in order to generateantibodies that specifically bind such immunogens. Such antibodiesinclude, but are not limited to, polyclonal, monoclonal, chimeric,single chain and Fab fragments. In a specific embodiment, antibodies toa human ERG, LTF or AMACR protein are produced. Antibodies can then beused in standard diagnostic assays to detect the protein produced by thedesired gene.

Various procedures known in the art may be used for the production ofpolyclonal antibodies to an ERG, LTF, or AMACR protein or derivative oranalog. In a particular embodiment, rabbit polyclonal antibodies to anepitope of a ERG, LTF, or AMACR protein can be obtained. For theproduction of antibody, various host animals can be immunized byinjection with the native ERG, LTF, or AMACR protein, or a syntheticversion, or derivative (e.g., fragment) thereof, including but notlimited to rabbits, mice, rats, etc. Various adjuvants may be used toincrease the immunological response, depending on the host species, andincluding, but not limited to, Freund's (complete and incomplete),mineral gels such as aluminum hydroxide, surface active substances suchas lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions,keyhole limpet hemocyanins, dinitrophenol, and potentially useful humanadjuvants such as BCG (bacille Calmette-Guerin) and corynebacteriumparvum.

For preparation of monoclonal antibodies directed toward a ERG, LTF, orAMACR protein sequence or analog thereof, any technique, which providesfor the production of antibody molecules by continuous cell lines inculture may be used. For example, the hybridoma technique originallydeveloped by Kohler et al (1975) Nature, 256:495-497, as well as thetrioma technique, the human B-cell hybridoma technique (Kozbor et al.(1983) Immunology Today, 4:72), and the EBV-hybridoma technique toproduce human monoclonal antibodies (Cole et al. (1985) MonoclonalAntibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Accordingto the invention, human antibodies may be used and can be obtained byusing human hybridomas (Cote et al. (1983) Proc. Natl. Acad. Sci.U.S.A., 80:2026-2030) or by transforming human B cells with EBV virus invitro (Cole et al. (1985) Monoclonal Antibodies and Cancer Therapy, AlanR. Liss, pp. 77-96). According to the invention, techniques developedfor the production of chimeric antibodies (Morrison et al. (1984) Proc.Natl. Acad. Sci. U.S.A., 81:6851-6855; Neuberger et al. (1984) Nature,312:604-608; Takeda et al. (1985) Nature, 314:452-454) by splicing thegenes from a mouse antibody molecule specific for ERG, LTF, or AMACRtogether with genes from a human antibody molecule of appropriatebiological activity can be used; such antibodies are within the scope ofthis invention.

Techniques described for the production of single chain antibodies (U.S.Pat. No. 4,946,778) can be used to produce ERG-, LTF-, or AMACR-specificsingle chain antibodies. An additional embodiment of the inventionutilizes the techniques described for the construction of Fab expressionlibraries (Huse et al. (1989) Science, 246:1275-1281) to allow rapid andeasy identification of monoclonal Fab fragments with the desiredspecificity for ERG, LTF or AMACR proteins, derivatives, or analogs.

Antibody fragments which contain the idiotype of the molecule can begenerated by known techniques. For example, such fragments include butare not limited to: the F(ab′)₂ fragment which can be produced by pepsindigestion of the antibody molecule; the Fab′ fragments which can begenerated by reducing the disulfide bridges of the F(ab′)₂ fragment, theFab fragments which can be generated by treating the antibody moleculewith papain and a reducing agent, and Fv fragments, including singlechain Fv (scFv) fragments.

In the production of antibodies, screening for the desired antibody canbe accomplished by techniques known in the art, e.g., ELISA. Forexample, to select antibodies that recognize a specific domain of a ERG,LTF, or AMACR protein, one may assay generated hybridomas for a productwhich binds to a ERG, LTF, or AMACR fragment containing such domain.

A second aspect of the invention provides for use of the expressionprofiles resulting from these methods in diagnostic methods, including,but not limited to, characterizing the treatment response to anytherapy, correlating expression profiles with clinico-pathologicfeatures, distinguishing indolent prostate cancers from those with amore aggressive phenotype (e.g. moderate risk versus high risk),analyzing tumor specimens of patients treated by radical prostatesurgery to help define prognosis, screening candidate genes for thedevelopment of a polynucleotide array for use as a blood test forimproved prostate cancer detection, and identifying further genes thatmay serve as biomarkers for response to treatment to screen drugs forthe treatment of advanced prostate cancer.

As will be readily appreciated by persons having skill in the art, theERG, LTF, DD3, and/or the AMACR nucleic acid sequences described hereincan easily be synthesized directly on a support, or pre-synthesizedpolynucleotide probes may be affixed to a support as described, forexample, in U.S. Pat. Nos. 5,744,305, 5,837,832, and 5,861,242, each ofwhich is incorporated herein by reference.

Such arrays may be used to detect specific nucleic acid sequencescontained in a target cell or sample, as described in U.S. Pat. Nos.5,744,305, 5,837,832, and 5,861,242, each of which is incorporatedherein by reference. More specifically, in the present invention, thesearrays may be used in methods for the diagnosis or prognosis of prostatecancer, such as by assessing the expression profiles of genes, inbiological samples. In a preferred embodiment, computer models may bedeveloped for the analysis of expression profiles. Moreover, suchpolynucleotide arrays are useful in methods to screen drugs for thetreatment of advanced prostate cancer. In these screening methods, thepolynucleotide arrays are used to analyze how drugs affect theexpression of the ERG, LTF, AMACR, and/or DD3 genes.

Therapeutic Uses

The invention provides for treatment or prevention of various diseasesand disorders by administration of a therapeutic compound (termed herein“therapeutic”). “Therapeutics” include but are not limited to: ERG orLTF proteins and analogs and derivatives (including fragments) thereof(e.g., as described herein above); nucleic acids encoding the ERG or LTFproteins, analogs, or derivatives; ERG or LTF antisense nucleic acids,ERG or LTF dominant negative mutants, siRNA against ERG or LTF, ERG orLTF antibodies and ERG or LTF agonists and antagonists. ERG or LTFagonists and antagonists, including small molecules, can be identifiedusing the methods disclosed in this application or any standardscreening assay to identify agents that modulate ERG or LTF expressionor function, particularly in prostate cancer cells. For example, ERG orLTF expression or function can be readily detected, e.g., by obtaining abiological sample from a patient, e.g., a tissue sample (e.g., frombiopsy tissue), a blood sample, or a urine sample, and assaying it invitro for mRNA or protein levels, structure and/or activity of theexpressed ERG or LTF mRNA or protein. Many methods standard in the artcan be employed, including but not limited to, kinase assays,immunoassays to detect and/or visualize ERG or LTF protein (e.g.,Western blot, immunoprecipitation followed by SDS-PAGE,immunocytochemistry, etc.) and/or hybridization assays to detect ERG orLTF expression by detecting and/or visualizing ERG or LTF mRNA (e.g.,Northern assays, dot blots, in situ hybridization, PCR (includingRT-PCR), TMA, NASAB, 3SR, LCR, SDA, LAMP, etc.).

Hyperproliferative Disorders

Disorders involving hyperproliferation of cells are treated or preventedby administration of a therapeutic that antagonizes (reduces orinhibits) ERG function or expression or enhances LTF function orexpression. In certain embodiments, ERG function is inhibited by use ofERG antisense nucleic acids. The present invention provides thetherapeutic or prophylactic use of nucleic acids of at least 10, 15,100, 200, 500, 1000, 1500, 2000, or 2500 contiguous nucleotides inantisense to any of the ERG nucleotides described herein. In aparticular embodiment, the ERG antisense nucleic acid comprises at least10, 15, 100, 200, 500, 1000, 1500, 2000, or 2500 contiguous nucleotidesin antisense orientation to the ERG nucleotide sequence. An ERG“antisense” nucleic acid as used herein refers to a nucleic acid capableof hybridizing under defined conditions to a portion of an ERG nucleicacid by virtue of some sequence complementarity. The antisense nucleicacid may be complementary to a coding and/or noncoding region of an ERGnucleic acid. Such antisense nucleic acids have utility as therapeuticsthat inhibit ERG function, and can be used in the treatment orprevention of disorders as described herein.

The antisense nucleic acids of the invention can be oligonucleotidesthat are double-stranded or single-stranded, RNA or DNA or amodification or derivative thereof, which can be directly administeredto a cell, or which can be produced intracellularly by transcription ofexogenously, introduced coding sequences.

The dominant negative mutants of the invention can be produced byexpression plasmids containing a nucleic acid encoding a non-functionaldomain of ERG, such as the DNA binding domain of ERG. These expressionplasmids can be introduced into a target cell or tissue and can inducetumor growth inhibition and apoptosis by acting as a dominant negativeform against the wild-type ERG transcription factors influencing cellhyperproliferation (Oikawa, Cancer Sci (2004), 95:626-33).

RNA interference can be achieved using siRNA against the ERG gene. ThesiRNA is a short double stranded RNA molecule of about 18-25 nucleotidesthat comprises a nucleotide sequence complementary to a region of thetarget gene. The siRNA can be introduced into a target cell or tissue,for example using an expression plasmid, where it interferes with thetranslation of the ERG gene. RNA interference techniques can be carriedout using known methods as described, for example, in published U.S.Patent Applications 20040192626, 20040181821, and 20030148519, each ofwhich is incorporated by reference.

Therapeutics which are useful according to this embodiment of theinvention for treatment of a disorder may be selected by testing forbiological activity in promoting the survival or differentiation ofcells. For example, in a specific embodiment relating to cancer therapy,including therapy of prostate cancer, a therapeutic decreasesproliferation of tumor cells. These effects can be measured as describedin the Examples or using any other method standard in the art.

In specific embodiments, malignancy or dysproliferative changes (such asmetaplasias and dysplasias), or hyperproliferative disorders, aretreated or prevented in the prostate.

The therapeutics of the invention that antagonize ERG activity can alsobe administered to treat premalignant conditions and to preventprogression to a neoplastic or malignant state, including but notlimited to those disorders described herein, such as prostate cancer.

Gene Therapy

In a specific embodiment, nucleic acids comprising a sequence encodingan ERG or LTF protein or functional derivative thereof, are administeredto promote ERG or LTF function, by way of gene therapy. Alternatively,nucleic acids comprising an antisense ERG sequence are administered toantagonize ERG expression or function. Gene therapy refers to therapyperformed by the administration of a nucleic acid to a subject.

Any of the methods for gene therapy available in the art can be usedaccording to the present invention. For specific protocols, see Morgan(2001) Gene Therapy Protocols, 2^(nd) ed., Humana Press. For generalreviews of the methods of gene therapy, see Goldspiel et al. (1993)Clinical Pharmacy, 12:488-505; Wu et al. (1991) Biotherapy, 3:87-95;Tolstoshev (1993) Ann. Rev. Pharmacol. Toxicol., 32:573-596; Mulligan(1993) Science, 260:926-932; and Morgan et al. (1993) Ann. Rev.Biochem., 62:191-217; May (1993) TIBTECH, 11(5):155-215). Methodscommonly known in the art of recombinant DNA technology which can beused are described in Current Protocols in Molecular Biology (2004),Ausubel et al., eds., John Wiley & Sons, NY; and Kriegler (1990) GeneTransfer and Expression, A Laboratory Manual, Stockton Press, NY.

In one embodiment, the therapeutic comprises an ERG or LTF nucleic acidor antisense ERG nucleic acid that is part of a vector. In particular,such a nucleic acid has a regulatory sequence, such as a promoter,operably linked to the ERG or LTF coding region or antisense molecule,said promoter being inducible or constitutive, and, optionally,tissue-specific. In another particular embodiment, a nucleic acidmolecule is used in which the ERG or LTF coding sequences and any otherdesired sequences are flanked by regions that promote homologousrecombination at a desired site in the genome, thus providing forintrachromosomal expression of the ERG or LTF nucleic acid (Koller etal. (1989) Proc. Natl. Acad. Sci. U.S.A., 86:8932-8935; Zijlstra et al.(1989) Nature, 342:435-438).

In a specific embodiment, the nucleic acid to be introduced for purposesof gene therapy comprises an inducible promoter operably linked to thedesired nucleic acids, such that expression of the nucleic acid iscontrollable by the appropriate inducer of transcription.

Delivery of the nucleic acid into a patient may be either direct, inwhich case the patient is directly exposed to the nucleic acid ornucleic acid-carrying vector, or indirect, in which case, cells arefirst transformed with the nucleic acid in vitro, then transplanted intothe patient. These two approaches are known, respectively, as in vivo orex vivo gene therapy.

In a specific embodiment, the nucleic acid is directly administered invivo, where it is expressed to produce the encoded product. This can beaccomplished by any of numerous methods known in the art, e.g., byconstructing it as part of an appropriate nucleic acid expression vectorand administering it so that it becomes intracellular, e.g., byinfection using a defective or attenuated retroviral or other viralvector (see U.S. Pat. No. 4,980,286, which is incorporated herein byreference), or by direct injection of naked DNA, or by use ofmicroparticle bombardment (e.g., a gene gun; Biolistic, DuPont), orcoating with lipids or cell-surface receptors or transfecting agents,encapsulation in liposomes, microparticles, or microcapsules, or byadministering it in linkage to a peptide which is known to enter thenucleus, by administering it in linkage to a ligand subject toreceptor-mediated endocytosis (see, e.g., Wu et al. (1987) J. Biol.Chem., 262:4429-4432). In another embodiment, a nucleic acid-ligandcomplex can be formed in which the ligand comprises a fusogenic viralpeptide to disrupt endosomes, allowing the nucleic acid to avoidlysosomal degradation. In yet another embodiment, the nucleic acid canbe targeted in vivo for cell-specific uptake and expression, bytargeting a specific receptor (see, e.g., PCT Pubs. WO 92/06180; WO92/22635; WO92/20316; WO93/14188; WO 93/20221). Alternatively, thenucleic acid can be introduced intracellularly and incorporated withinhost cell DNA for expression, by homologous recombination (Koller et al.(1989) Proc. Natl. Acad. Sci. U.S.A., 86:8932-8935; Zijlstra et al.(1989) Nature, 342:435-438).

In a specific embodiment, a viral vector that contains an ERG or LTFnucleic acid is used. For example, a retroviral vector can be used (see,Miller et al. (1993) Meth. Enzymol., 217:581-599). These retroviralvectors have been modified to delete retroviral sequences that are notnecessary for packaging of the viral genome and integration into hostcell DNA. The ERG or LTF nucleic acid to be used in gene therapy iscloned into the vector, which facilitates delivery of the gene into apatient. More detail about retroviral vectors can be found in Boesen etal. (1994) Biotherapy, 6:291-302, which describes the use of aretroviral vector to deliver the MDRL gene to hematopoietic stem cellsin order to make the stem cells more resistant to chemotherapy. Otherreferences illustrating the use of retroviral vectors in gene therapyare: Clowes et al. (1994) J. Clin. Invest., 93:644-651; Kiem et al.(1994) Blood, 83:1467-1473; Salmons et al. (1993) Hum. Gene Ther.,4:129-141; and Grossman et al. (1993) Curr. Opin. Gen. Devel.,3:110-114.

Adenoviruses are other viral vectors that can be used in gene therapy.Adenoviruses are especially attractive vehicles for delivering genes torespiratory epithelia. Adenoviruses naturally infect respiratoryepithelia where they cause a mild disease. Other targets foradenovirus-based delivery systems are liver, the central nervous system,endothelial cells, and muscle. Adenoviruses have the advantage of beingcapable of infecting non-dividing cells. Kozarsky et al. (1993, Curr.Opin. Gen. Devel., 3:499-503) present a review of adenovirus-based genetherapy. Bout et al. (1994, Hum. Gene Ther., 5:3-10) demonstrated theuse of adenovirus vectors to transfer genes to the respiratory epitheliaof rhesus monkeys. Other instances of the use of adenoviruses in genetherapy can be found in Rosenfeld et al. (1991) Science, 252:431-434;Rosenfeld et al. (1992) Cell, 68:143-155; and Mastrangeli et al. (1993)J. Clin. Invest., 91:225-234.

Adeno-associated virus (AAV) has also been proposed for use in genetherapy (Walsh et al. (1993) Proc. Soc. Exp. Biol. Med., 204:289-300).

Another approach to gene therapy involves transferring a gene to cellsin tissue culture by such methods as electroporation, lipofection,calcium phosphate mediated transfection, or viral infection. Usually,the method of transfer includes the transfer of a selectable marker tothe cells. The cells are then placed under selection to isolate thosecells that have taken up and are expressing the transferred gene. Thosecells are then delivered to a patient.

In this embodiment, the nucleic acid is introduced into a cell prior toadministration in vivo of the resulting recombinant cell. Suchintroduction can be carried out by any method known in the art,including but not limited to transfection, electroporation,microinjection, infection with a viral or bacteriophage vectorcontaining the nucleic acid sequences, cell fusion, chromosome-mediatedgene transfer, microcell-mediated gene transfer, spheroplast fusion,etc. Numerous techniques are known in the art for the introduction offoreign genes into cells (see, e.g., Loeffler et al. (1993) Meth.Enzymol., 217:599-618; Cohen et al. (1993) Meth. Enzymol., 217:618-644;Cline (1985) Pharmac. Ther., 29:69-92) and may be used in accordancewith the present invention, provided that the necessary developmentaland physiological functions of the recipient cells are not disrupted.The technique should provide for the stable transfer of the nucleic acidto the cell, so that the nucleic acid is expressible by the cell andpreferably heritable and expressible by its cell progeny.

The resulting recombinant cells can be delivered to a patient by variousmethods known in the art. In one preferred embodiment, epithelial cellsare injected, e.g., subcutaneously. In another embodiment, recombinantskin cells may be applied as a skin graft onto the patient. Recombinantblood cells (e.g., hematopoietic stem or progenitor cells) may beadministered intravenously. The amount of cells envisioned for usedepends on the desired effect, patient state, etc., and can bedetermined by one skilled in the art.

Cells into which a nucleic acid can be introduced for purposes of genetherapy encompass any desired, available cell type, and include, but arenot limited to, epithelial cells, endothelial cells, keratinocytes,fibroblasts, muscle cells, hepatocytes, T lymphocytes, B lymphocytes,monocytes, macrophages, neutrophils, eosinophils, megakaryocytes,granulocytes; various stem or progenitor cells, in particularhematopoietic stem or progenitor cells, e.g., as obtained from bonemarrow, umbilical cord blood, peripheral blood, fetal liver, etc. Incertain embodiments, the cells used for gene therapy are autologous tothe patient.

In one embodiment, an ERG or LTF nucleic acid or antisense molecule isintroduced into the cells such that it is expressible by the cells ortheir progeny, and the recombinant cells are then administered in vivofor therapeutic effect. In a specific embodiment, stem or progenitorcells are used. Any stem and/or progenitor cells which can be isolatedand maintained in vitro can potentially be used in accordance with thisembodiment of the present invention. Such stem cells include, but arenot limited to, hematopoietic stem cells (HSC), stem cells of epithelialtissues such as the skin and the lining of the gut, embryonic heartmuscle cells, liver stem cells (PCT Pub. WO 94/08598), and neural stemcells (Stemple et al. (1992) Cell, 71:973-985).

Epithelial stem cells (ESCs) or keratinocytes can be obtained fromtissues such as the skin and the lining of the gut by known procedures(Rheinwald (1980) Meth. Cell Bio., 21A:229). In stratified epithelialtissue such as the skin, renewal occurs by mitosis of stem cells withinthe germinal layer, the layer closest to the basal lamina. Stem cellswithin the lining of the gut provide for a rapid renewal rate of thistissue. ESCs or keratinocytes obtained from the skin or lining of thegut of a patient or donor can be grown in tissue culture (Rheinwald(1980) Meth. Cell Bio., 21A:229; Pittelkow et al. (1986) Mayo Clinic.Proc., 61:771). If the ESCs are provided by a donor, a method forsuppression of host versus graft reactivity (e.g., irradiation, drug orantibody administration to promote moderate immunosuppression) can alsobe used.

With respect to hematopoietic stem cells (HSC), any technique whichprovides for the isolation, propagation, and maintenance in vitro of HSCcan be used in this embodiment of the invention. Techniques by whichthis may be accomplished include (a) the isolation and establishment ofHSC cultures from bone marrow cells isolated from the future host, or adonor, or (b) the use of previously established long-term HSC cultures,which may be allogeneic or xenogeneic. Non-autologous HSC may be used inconjunction with a method of suppressing transplantation immunereactions of the future host/patient. In a particular embodiment, humanbone marrow cells can be obtained from the posterior iliac crest byneedle aspiration (see, e.g., Kodo et al. (1984) J. Clin. Invest.,73:1377-1384). In one embodiment, the HSCs can be made highly enrichedor in substantially pure form. This enrichment can be accomplishedbefore, during, or after long-term culturing, and can be done by anytechniques known in the art. Long-term cultures of bone marrow cells canbe established and maintained by using, for example, modified Dextercell culture techniques (Dexter et al. (1977) J. Cell Physiol., 91:335)or Witlock-Witte culture techniques (Witlock et al. (1982) Proc. Natl.Acad. Sci. U.S.A., 79:3608-3612).

Pharmaceutical Compositions and Administration

The invention further provides pharmaceutical compositions comprising aneffective amount of an ERG or LTF therapeutic, including ERG or LTFnucleic acids (sense or antisense) or ERG or LTF polypeptides of theinvention, in a pharmaceutically acceptable carrier, as described below.

Compositions comprising an effective amount of a polypeptide of thepresent invention, in combination with other components such as aphysiologically acceptable diluent, carrier, or excipient, are providedherein. The polypeptides can be formulated according to known methodsused to prepare pharmaceutically useful compositions. They can becombined in admixture, either as the sole active material or with otherknown active materials suitable for a given indication, withpharmaceutically acceptable diluents (e.g., saline, Tris-HCl, acetate,and phosphate buffered solutions), preservatives (e.g., thimerosal,benzyl alcohol, parabens), emulsifiers, solubilizers, adjuvants and/orcarriers. Suitable formulations for pharmaceutical compositions includethose described in Remington's Pharmaceutical Sciences, 16^(th) ed.,Mack Publishing Company, Easton, Pa., 1980.

In addition, such compositions can be complexed with polyethylene glycol(PEG), metal ions, or incorporated into polymeric compounds such aspolyacetic acid, polyglycolic acid, hydrogels, dextran, etc., orincorporated into liposomes, microemulsions, micelles, unilamellar ormultilamellar vesicles, erythrocyte ghosts or spheroblasts. Suchcompositions will influence the physical state, solubility, stability,rate of in vivo release, and rate of in vivo clearance, and are thuschosen according to the intended application.

The compositions of the invention can be administered in any suitablemanner, e.g., topically, parenterally, or by inhalation. The term“parenteral” includes injection, e.g., by subcutaneous, intravenous, orintramuscular routes, also including localized administration, e.g., ata site of disease or injury. Sustained release from implants is alsocontemplated. One skilled in the art will recognize that suitabledosages will vary, depending upon such factors as the nature of thedisorder to be treated, the patient's body weight, age, and generalcondition, and the route of administration. Preliminary doses can bedetermined according to animal tests, and the scaling of dosages forhuman administration is performed according to art-accepted practices.

Compositions comprising nucleic acids of the invention inphysiologically acceptable formulations, e.g., to be used for genetherapy are also contemplated. In one embodiment, the nucleic acid canbe administered in vivo to promote expression of the encoded protein, byconstructing it as part of an appropriate nucleic acid expression vectorand administering it so that it becomes intracellular as described inother sections herein.

Various delivery systems are known in the art and can be used toadminister a therapeutic of the invention. Examples include, but are notlimited to encapsulation in liposomes, microparticles, microcapsules,recombinant cells capable of expressing the therapeutic,receptor-mediated endocytosis (see, e.g., Wu et al. (1987) J. Biol.Chem., 262:4429-4432), construction of a therapeutic nucleic acid aspart of a retroviral or other vector, etc. Methods of introductioninclude, but are not limited to, intradermal, intramuscular,intraperitoneal, intravenous, subcutaneous, intranasal, epidural, andoral routes. The compounds may be administered by any convenient route,for example by infusion or bolus injection, by absorption throughepithelial or mucocutaneous linings (e.g., oral mucosa, rectal andintestinal mucosa, etc.) and may be administered together with otherbiologically active agents. Administration can be systemic or local. Inaddition, it may be desirable to introduce the pharmaceuticalcompositions of the invention into the central nervous system by anysuitable route, including intraventricular and intrathecal injection;intraventricular injection may be facilitated by an intraventricularcatheter, for example, attached to a reservoir, such as an Ommayareservoir. Pulmonary administration can also be employed, e.g., by useof an inhaler or nebulizer, and formulation with an aerosolizing agent.

In a specific embodiment, it may be desirable to administer thepharmaceutical compositions of the invention locally to the area in needof treatment; this may be achieved by, for example, local infusionduring surgery, topical application, e.g., in conjunction with a wounddressing after surgery, by injection, by means of a catheter, asuppository, an implant, wherein the said implant is of a porous,non-porous, or gelatinous material, including membranes, such assialastic membranes, or fibers. In one embodiment, administration can beby direct injection at the site (or former site) of a malignant tumor orneoplastic or pre-neoplastic tissue.

In another embodiment, the therapeutic can be delivered in a vesicle, inparticular a liposome (see Langer (1990) Science, 249:1527-1533; Treatet al. (1989) in Liposomes in the Therapy of Infectious Disease andCancer, Lopez-Berestein et al., eds., Liss, New York, pp. 353-365;Lopez-Berestein, ibid., pp. 317-327. In yet another embodiment, thetherapeutic can be delivered in a controlled release system. In oneembodiment, a pump may be used (see Langer, supra; Sefton (1987) CRCCrit. Ref. Biomed. Eng., 14:201; Buchwald et al. (1980) Surgery, 88:507;Saudek et al. (1989) New Engl. J. Med., 321:574). In another embodiment,polymeric materials can be used (see Medical Applications of ControlledRelease, Langer et al., eds., CRC Pres., Boca Raton, Fla., 1974;Controlled Drug Bioavailability, Drug Product Design and Performance,Smolen et al., eds., Wiley, New York, 1984; Ranger et al. (1983) J.Macromol. Sci. Rev. Macromol. Chem., 23:61; see also Levy et al. (1985)Science, 228:190; During et al. (1989) Ann. Neurol., 25:351; Howard etal. (1989) J. Neurosurg., 71:105. In yet another embodiment, acontrolled release system can be placed in proximity of the therapeutictarget, i.e., the brain, thus requiring only a fraction of the systemicdose (see, e.g., Goodson (1984) in Medical Applications of ControlledRelease, supra, vol. 2, pp. 115-138). Other controlled release systemsare discussed in the review by Langer (1990, Science, 249:1527-1533).

Diagnosis and Screening

ERG, LTF, and/or AMACR proteins, analogues, derivatives, and fragmentsthereof, and antibodies thereto; ERG, LTF, DD3, and/or AMACR nucleicacids (and their complementary and homologous sequences) and antibodiesthereto, including anti-ERG, anti-DD3, anti-LTF and/or anti-AMACRantibodies, have uses in diagnostics. Such molecules can be used inassays, such as immunoassays, to detect, prognose, diagnose, or monitorvarious conditions, diseases, and disorders affecting ERG, LTF, DD3,and/or AMACR expression, or monitor the treatment thereof, particularlycancer, and more particularly prostate cancer. In particular, such animmunoassay is carried out by a method comprising contacting a samplederived from an individual with an anti-ERG, anti-LTF, anti-DD3, and/oranti-AMACR antibody (directed against either a protein product or anucliec acid) under conditions such that specific binding can occur, anddetecting or measuring the amount of any specific binding by theantibody. In one embodiment, such binding of antibody, in tissuesections, can be used to detect aberrant ERG, LTF, DD3, and/or AMACRlocalization or aberrant (e.g., high, low or absent) levels of ERG, LTF,DD3, and/or AMACR. In a specific embodiment, antibody to ERG, LTF, DD3,and/or AMACR can be used to assay in a biological sample (e.g., tissue,blood, or urine sample) for the presence of ERG, LTF, DD3, and/or AMACRwhere an aberrant level of ERG, LTF, DD3, and/or AMACR is an indicationof a diseased condition, such as cancer, including, for example,prostate cancer.

Any biological sample in which it is desired to detect anoligonucloetide or polypeptide of interest can be used, includingtissue, cells, blood, lymph, semen, and urine. The biological sample ispreferably derived from prostate tissue, blood, or urine. The tissuesample comprises cells obtained from a patient. The cells may be foundin a prostate tissue sample collected, for example, by a prostate tissuebiopsy or histology section, or a bone marrow biopsy. The blood samplecan include whole blood, plasma, serum, or any derivative thereof,including, for example, circulating cells, such as prostate cells,isolated from the blood sample, or nucleic acid or protein obtained fromthe isolated cells. Blood may contain prostate cells, particularly whenthe prostate cells are cancerous, and, more particularly, when theprostate cancer metastasizes and is shed into the blood. Similarly, theurine sample can be whole urine or any derivative thereof, including,for example, cells, such as prostate cells, obtained from the urine.

The immunoassays which can be used include, but are not limited to,competitive and non-competitive assay systems using techniques such asWestern blots, radioimmunoassays, ELISA, immunoprecipitation assays,immunodiffusion assays, agglutination assays, complement-fixationassays, immunoradiometric assays, fluorescent immunoassays, protein Aimmunoassays, to name but a few.

ERG, LTF, DD3, and/or AMACR genes and related nucleic acid sequences andsubsequences, including complementary sequences, can also be used inhybridization assays. ERG, LTF, DD3, and/or AMACR nucleic acidsequences, or subsequences thereof comprising about at least 8, 15, 20,50, 100, 250, or 500 nucleotides can be used as hybridization probes.Hybridization assays can be used to detect, prognose, diagnose, ormonitor conditions, disorders, or disease states associated withaberrant changes in ERG, LTF, DD3, and/or AMACR expression and/oractivity as described above. In particular, such a hybridization assayis carried out by a method comprising contacting a sample containingnucleic acid with a nucleic acid probe capable of hybridizing underdefined conditions (preferably under high stringency hybridizationconditions, e.g., hybridization for 48 hours at 65° C. in 6×SSC followedby a wash in 0.1×SSX at 50° C. for 45 minutes) to an ERG, LTF, DD3,and/or AMACR nucleic acid, and detecting (i.e., measuring eitherqualitatively or quantitatively) the degree of the resultinghybridization. As described herein, any nucleic acid amplificationprocedure, including, PCR/RT-PCR, TMA, NASBA, 3SR, LCR, SDA, and LAMPcan be used to detect the presence of the ERG, LTF, DD3 and/or AMACRgene and/or the level of its mRNA expression.

In some applications, probes exhibiting at least some degree ofself-complementarity are desirable to facilitate detection ofprobe:target duplexes in a test sample without first requiring theremoval of unhybridized probe prior to detection. Molecular torch probesare a type of self-complementary probes that are disclosed by Becker etal., U.S. Pat. No. 6,361,945. The molecular torch probes disclosedBecker et al. have distinct regions of self-complementarity, referred toas “the target binding domain” and “the target closing domain,” whichare connected by a joining region and which hybridize to one anotherunder predetermined hybridization assay conditions. When exposed todenaturing conditions, the complementary regions (which may be fully orpartially complementary) of the molecular torch probe melt, leaving thetarget binding domain available for hybridization to a target sequencewhen the predetermined hybridization assay conditions are restored. Andwhen exposed to strand displacement conditions, a portion of the targetsequence binds to the target binding domain and displaces the targetclosing domain from the target binding domain. Molecular torch probesare designed so that the target binding domain favors hybridization tothe target sequence over the target closing domain. The target bindingdomain and the target closing domain of a molecular torch probe includeinteracting labels (e.g., luminescent/quencher) positioned so that adifferent signal is produced when the molecular torch probe isself-hybridized as opposed to when the molecular torch probe ishybridized to a target nucleic acid, thereby permitting detection ofprobe:target duplexes in a test sample in the presence of unhybridizedprobe having a viable label or labels associated therewith.

Another example of detection probes having self-complementarity are themolecular beacon probes disclosed by Tyagi et al. in U.S. Pat. No.5,925,517. Molecular beacon probes include nucleic acid molecules havinga target complement sequence, an affinity pair (or nucleic acid arms)holding the probe in a closed conformation in the absence of a targetnucleic acid sequence, and a label pair that interacts when the probe isin a closed conformation. Hybridization of the target nucleic acid andthe target complement sequence separates the members of the affinitypair, thereby shifting the probe to an open confirmation. The shift tothe open confirmation is detectable due to reduced interaction of thelabel pair, which may be, for example, a fluorophore and quencher, suchas DABCYL and EDANS.

By way of example, ERG, LTF, AMACR, or DD3 hybridization probes cancomprise a nucleic acid having a contiguous stretch of at least about 8,15, 20, 50, 100, 250, 500, 750, 1000, 1250, or 1500 contiguousnucleotides of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, orSEQ ID NO:5 or a sequence complementary thereto. Such contiguousfragments of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQID NO:5 may also contain at least one mutation so long as the mutantsequence retains the capacity to hybridize to SEQ ID NO:1, SEQ ID NO:2,SEQ ID NO: 3, SEQ ID NO:4, or SEQ ID NO:5 under low or high stringencyconditions (preferably under high stringency hybridization conditions,e.g., hybridization for 48 hours at 65° C. in 6×SSC followed by a washin 0.1×SSX at 50° C. for 45 minutes).

In specific embodiments, diseases and disorders involvinghyperproliferation of cells, such as cancers, including, for example,prostate cancer, can be diagnosed, or their suspected presence can bescreened for, or a predisposition to develop such disorders can bepredicted, by detecting levels of the ERG, LTF, and/or AMACR protein,ERG, DD3, and/or AMACR RNA, or ERG, DD3, and/or AMACR functionalactivity, or by detecting mutations in ERG, DD3, LTF and/or AMACR RNA,DNA, or protein (e.g., translocations in ERG, LFT, DD3, or AMACR nucleicacids, truncations in the ERG, LFT, DD3, or AMACR gene or protein,changes in nucleotide or amino acid sequence relative to wild-type ERG,LTF, DD3, or AMACR) that cause increased or decreased expression oractivity of ERG, LTF, DD3, and/or AMACR. By way of example, levels ofERG, LTF, and/or AMACR protein can be detected by immunoassay, levels ofERG, LTF, DD3, and/or AMACR mRNA can be detected by hybridization assays(e.g., Northern blots, dot blots, or any nucleic acid amplificationprocedure, including, PCR/RT-PCR, TMA, NASBA, 3SR, LCR, SDA, and LAMP),translocations and point mutations in ERG, LTF, DD3, and/or AMACRnucleic acids can be detected by Southern blotting, RFLP analysis, anynucleic acid amplification procedure, including, PCR/RT-PCR, TMA, NASBA,3SR, LCR, SDA, LAMP, sequencing of the ERG, LTF, DD3, and/or AMACRgenomic DNA or cDNA obtained from the patient, etc.

In one embodiment, levels of the ERG, DD3, LTF and/or AMACR mRNA orprotein in a subject sample are detected or measured and compared to themRNA or protein expression levels of the corresponding gene in a controlsample or to a standard numerical value or range. For example, increasedexpression levels of ERG, DD3, and/or AMACR or decreased levels of LTF,relative to a matched, normal tissue sample, indicate that the subjecthas a malignancy or hyperproliferative disorder, including, for example,prostate cancer, or a predisposition to develop the same. Otherappropriate controls include other noncancerous samples from thesubject, samples obtained from a different subject without cancer, orother cancer-specific markers. For example, in prostate cancer, aprostate-cell specific marker, such as PSA, can be used as a control tocompare and/or normalize expression levels of other genes, such as ERG,LTF, DD3, and/or AMACR. In one embodiment, a method of diagnosingcancer, such as prostate cancer, comprises obtaining a biological samplefrom a subject (e.g., a tissue sample (e.g., from biopsy tissue), ablood sample, or a urine sample), determining the expression level of aERG, LTF, DD3, and/or AMACR gene and/or ERG, LTF, DD3, and/or AMACRactivity in the samples, and diagnosing or prognosing cancer in saidsubject. In further embodiments, the expression level of the ERG, LTF,DD3, and/or AMACR gene and/or ERG, LTF, DD3, and/or AMACR activity isdetermined by Southern blotting, Northern blotting, Western blotting,ELISA, any nucleic acid amplification procedure, including, PCR/RT-PCR,TMA, NASBA, 3SR, LCR, SDA, and LAMP, or other techniques as describedherein or known in the art. Without limiting the instant invention,increased or decreased expression of at least two times, as compared tothe control sample indicates the presence of prostate cancer or a higherpredisposition to developing prostate cancer.

Another aspect of the invention provides a means for monitoring aresponse to “hormonal therapy” by evaluating the expression profiles ofthe ERG gene, alone or in combination with the AMACR and/or DD3 genesand/or LTF genes, and correlating these profiles with the clinical signsof the disease.

Kits for diagnostic use are also provided. A kit comprises an anti-ERGgene antibody or an antibody directed against the ERG protein and/or ananti-AMACR gene antibody or an antibody directed against the AMACRprotein and/or an anti-DD3 gene antibody and/or and an anti-LTF geneantibody or an antibody directed against the LTF protein, which can beoptionally detectably labeled. A kit is also provided that comprises anucleic acid probe capable of hybridizing under defined conditions(preferably under high stringency hybridization conditions, e.g.,hybridization for 48 hours at 65° C. in 6×SSC followed by a wash in0.1×SSX at 50° C. for 45 minutes) to ERG, LTF, DD3, and/or AMACR nucleicacid. In a specific embodiment, a kit comprises at least a pair ofprimers (e.g., each in the size range of at least about 6, 17, 30, or 60nucleotides) that are capable of priming amplification, by any nucleicacid amplification procedure (including e.g., PCR/RT-PCR, TMA, NASBA,3SR, LCR, SDA, LAMP), of the ERG, LTF, DD3, and/or AMACR gene or afragment thereof. A kit can comprise a predetermined amount of apurified ERG, LTF, DD3, and/or AMACR protein or nucleic acid for use,e.g., as a standard or control. The kit can also comprise one or morecomponents for detecting the nucleic acid probe, including componentsdescribed herein or known in the art.

In one embodiment, the kit comprises a nucleic acid that hybridizesunder defined conditions (and preferably under conditions of highstringency, e.g., hybridization for 48 hours at 65° C. in 6×SSC followedby a wash in 0.1×SSX at 50° C. for 45 minutes) with at least one genechosen from those genes identified in Tables 1-6 or the DD3 gene, and isaffixed to a support, alone, or in combination with other nucleic acids.For example, an ERG and/or LTF nucleic acid can be affixed to thesupport, with or without other nucleic acids. In a specific embodiment,the support comprises at least an ERG nucleic acid and an AMACR nucleicacid or at least an ERG nucleic acid and a DD3 nucleic acid. In anotherembodiment, the support comprises at least an ERG nucleic acid, an AMACRnucleic acid, and a DD3 nucleic acid. This support can be used as partof a kit for detecting cancer, such as prostate cancer. These kits canfurther comprise at least a pair of primers (e.g., each in the sizerange of at least about 6, 17, 30, or 60 nucleotides) that are capableof priming amplification, by any nucleic acid amplification procedure(including e.g., PCR/RT-PCR, TMA, NASBA, 3SR, LCR, SDA, LAMP), of theERG, LTF, DD3, and/or AMACR gene or a fragment thereof.

EXAMPLES Example 1 Screening of CaP Cell-Specific Gene ExpressionSignatures Using Affymetrix GeneChip Patient Selection

Specimens were obtained under an IRB-approved protocol from patientstreated by radical prostatectomy (RP) at Walter Reed Army Medical Center(WRAMC). From over 300 patients two groups were selected which hadprostate tumors with either moderate (MR) or high risk (HR) of diseaseprogression after RP. The HR group had PSA recurrence, Gleason score8-9, T3c stage, seminal vesicle invasion, and poorly differentiatedtumor cells; the MR group had no PSA recurrence, Gleason score 6-7,T2a-T3b stage, no seminal vesicle invasion, and well to moderatelydifferentiated tumor cells. LCM compatible specimens were selected fromage and race matched HR or MR patients with no family history of CaP.

Tissue Samples and Laser-Capture Microdissection

Normal and cancer cells were laser capture microdissected (LCM) from OCTembedded and Hematoxylin-eosin (H&E) stained frozen prostate sections ofradical prostatectomy specimens (2000 laser shots for one sample). Lasercapture microdissection (LCM) facilitates the isolation ofmorphologically defined, homogenous cell populations from complextissues by selectively adhering the cells of interest to a transparentfilm with focused pulses of low energy infrared laser under amicroscope. Emmert-Buck et al., Science (1996); 274(5289): 921-922;Schutz et al., Nat Biotechnol (1998) 16(8): 737-742.

RNA Extraction and T7-Based Linear RNA Amplification

Total RNA was isolated from the LCM samples with the MicroRNA kit(Stratagene, La Jolla, Calif.), quantified using RiboGreen dye(Molecular Probes, Eugene, Oreg.) and VersaFluor fluorimeter (BioRad,Hercules, Calif.), and quality tested by RT-PCR using NKX3.1 and GAPDHprimers. Linear RNA amplification was performed using RiboAmp RNAamplification kit (Arcturus, Mountain View, Calif.). Precisely, 2nanograms of total RNA from LCM derived epithelial cells of normal aswell as tumor tissue from each patient was used for the first round ofamplification. During the second round of amplification after cDNAsynthesis and purification the samples were biotinylated during in vitrotranscription which was used for the GeneChip analysis.

Gene Chip Analysis

Linearly amplified aRNA was hybridized to high-density oligonucleotidehuman genome array (HG U133A array) (Affymetrix, Santa Clara, Calif.,USA). The array contains 22,283 probe sets, about 18,000 of whichrepresent well annotated genes, while the remainder represent variousexpressed sequence tags (EST) and hypothetical genes. Biotinylation wascarried out using aRNA by in vitro transcription using MEGA script T7 invitro Transcription Kit (Ambion, Austin, Tex., USA) cDNA andbiotinylated UTP and biotinylated CTP (ENZO, Farmingdale, N.Y.,USA)(34). The biotin labeled cRNA was purified using the QIAGEN RNeasyspin columns (QIAGEN, Valencia, Calif.) following the manufacturer'sprotocol. The biotin labeled cRNA was fragmented in a 40 μl reactionmixture containing 40 mM Tris-acetate, pH 8.1, 100 mM potassium acetate,and 30 mM magnesium acetate incubated at 94° C. for 35 minutes and thenput on ice.

Hybridization, Staining and Scanning of the GeneChip

The biotin labeled and fragmented aRNA was hybridized to the HG U133Aarray. Briefly, a 220 μl hybridization solution consisting of: 1M NaCl,10 mM Tris pH 7.6, 0.005% Triton X-100, 50 μM control Oligo B2 (5′bioGTCAAGATGCTACCGTTCAG 3′) (SEQ ID NO:6) (Affymetrix); the control cRNAcocktail of: Bio B (150 μM), Bio C (500 μM), Bio D (2.5 nM) and Cre X(10 nM) (American Type Tissue Collection, Manassas, Va. and LofstrandLabs, Gaithersburg, Md.), 0.1 mg/ml herring sperm DNA and 0.05 μg/μl ofthe fragmented labeled sample cRNA was heated to 95° C. for 35 min.,cooled to 40° C. and clarified by centrifugation. Hybridization was at42° C. in a rotisserie hybridization oven (Model 320, Affymetrix) at 60rpm for 16 hours. Following hybridization, the GeneChip arrays werewashed 10 times at 25° C. with 6×SSPE-T buffer (1 M NaCl, 0.006 M EDTA,and 0.06 M Na₃PO₄, 0.005% Triton X-100, pH 7.6) using the automatedfluidics station protocol. GeneChip arrays were incubated at 50° C. in0.5×SSPE-T, 0.005% Triton X-100 for 20 minutes at 60 rpm in therotisserie oven. GeneChip arrays were stained for 15 minutes at roomtemperature and at 60 rpm, with streptavidin phycoerythrin (MolecularProbes, Inc., Eugene, Oreg.) stain solution at a final concentration of10 μg/ml in 6×SSPE-T buffer and 1.0 mg/ml acetylated bovine serumalbumin (Sigma). GeneChip arrays were washed twice at room temperaturewith 6×SSPE-T buffer, and then were scanned with the HP GeneArrayScanner (Hewlett-Packard, Santa Clara, Calif.) controlled by GeneChip3.1 Software (Affymetrix).

Example 2 Analysis of GeneChip Results by Supervised Multi-DimensionalScaling (MDS) Image Analysis and Data Collection

Affymetrix GeneChip Microarray Analysis Software, version 3.1 andAffymetrix Micro DB and Data Mining Tool version 2.0 (Affymetrix),Microsoft Excel 2000 (Microsoft, Seattle, Wash.) and Statistica version4.1 (Stat Soft, Inc., Tulsa, Okla.) were used. In the Affymetrix system,the average difference fluorescence is the average of the differencebetween every perfect match probe cell and its control mismatch probecell and is directly related to the level of expression of a transcript.A comparative file indicates the relative change in abundance (foldchange) for each transcript between a baseline and an experimentalsample. For further detail and advanced bioinformatic analysis we usedthe Microarray Data Analysis software from NHGR1 and the GeneSpringsoftware (Silicon Genetics, CA).

Data Analysis

For clustering analysis, National Human Genome Research Institute(NHGR1) Microarray Data Analysis software was used, which partitionedthe samples of the high risk and moderate risk groups intowell-separated and homogeneous groups based on the statistical behaviorof their genes expression. To achieve the objective of clustering eachof the groups, all pair-wise similarities between samples wereevaluated, and then grouped via the average linkage algorithm. Pearsoncorrelation coefficient or Euclidean distance were typically used toquantify the similarity. Unsupervised hierarchical and or nonhierarchical clustering was also performed using the same distancematrix.

Using a matrix of Euclidean distance measurements from complete pairwise comparison of all the prostate specimens, a multidimensionalscaling (MDS) method was performed using an implementation of MDS in theMATLAB package to determine the overall similarities and dissimilaritiesin gene expression profiles. A weighted gene analysis was performed togenerate a subset of genes statistically significant in separating thehigh risk group from the moderate risk group.

Briefly, for two different groups e.g., epithelium of high risk tumorand epithelium of moderate risk tumor with a given number of samples 25and 25, the discriminative weight for each gene is determined by theformula: w=d_(B)/(k₁d_(w1)+k₂d_(w2)+α); where d_(B) is the Euclideandistance between two groups (center-to-center or between clusterEuclidean distance), d_(w1) is the average Euclidean distance among allthe epithelial samples of high risk group, d_(w2) is average Euclideandistance among all the epithelial samples of moderate risk group,k₁=25/(25+25), k₂=25/(25+25), and α is a small constant to ensure thedenominator is never equal to zero. Genes were ranked according to theirw values. Genes with high w values created greater separation betweengroups and denser compaction within the group. In other words, thesubset of genes with high w values have the most discriminative power todifferentiate a high risk group from a moderate risk group and viceversa. Sample labels were randomly permuted and the w value was computedagain for each gene to test the statistical significance of thediscriminative weights. Genes with the most significant expressiondifferences were selected by p-values. A hierarchical clusteringalgorithm was used to verify the predictor model obtained from thesupervised MDS analysis.

From this analysis, specific genes were identified whose expressionsignature in tumor tissue varied from their expression signature inbenign matched tissue. Genes with a p-value of not more than 0.05 wereselected and ranked by p-value, as shown in Tables 1-6.

In Silico Validation:

We have tested the discriminatory potential of the genes that weobtained from our analysis on some independent data sets. Affymetrixoligonucleotide GeneChip Hum95Av2 data were obtained from Welsh et al.2001, Singh et al. Genes from these data bases that correspond with thegenes of our discriminatory list were selected and their tumor specificexpression intensities and/or tumor over normal ratio were used for anMDS analysis as described above in the data analysis section. MDS plotswere obtained depicting the discriminatory capability of the genes onthe independent data sets.

TABLE 1 The first 50 genes obtained from the supervised MDS analysis oftumor versus benign tissues of all the high risk and moderate risk CaPpatients, ranked by p-value. (T vs B in All 18 Samples) ExpressionGenBank Regulation No. Accession Common Name of Genes Description ofGenes Map p-Value Tumor Benign 1. AF047020 AMACR Alpha-methylacyl-CoAracemase 5p13.2-q11.1 0 Up Down 2. NM_002343 LTF Lactotransferrin3q21-q23 0 Down Up 3. NM_002275 KRT15 Keratin 15 17q21 0.000001 Down Up4. BC000915 CLIM1, CLP36, CLP-36 PDZ and LIM domain 1 (elfin)10q22-q26.3 0.000001 Down Up 5. X90579 CYP3A5 Cytochrome P450, subfamily3A, 7 0.000001 Down Up polypeptide 5 6. NM_003671 CDC14B1, CDC14B2 H.sapiens CDC14 cell division cycle 14 9q22.2-q22.31 0.000005 Down Uphomolog B 7. AI424243 CEGP1 H. sapiens cDNA clone 11 0.000005 Down UpIMAGE: 2094442 8. NM_022370 Rbig1 Hypothetical protein FLJ21044 similar11q24.2 0.000009 Down Up to Rbig1 9. AI356398 ZFP36L2 TISD_HUMAN P47974TIS11D 2 0.000018 Down Up PROTEIN 10. NM_005213 STF1, STFA Cystatin A(stefin A) 3q21 0.000018 Down Up 11. NM_006394 RIG Regulated in glioma11p15.1 0.000018 Down Up 12. AF275945 EVA1 Epithelial V-like antigen 111q23.3 0.000018 Down Up 13. NM_020186 DC11 DC11 protein 7q21.3 0.000018Up Down 14. AI922538 TMEM1 Transmembrane protein 1 21 0.000018 Down Up15. NM_014863 BRAG, KIAA0598 B cell RAG associated protein 10q260.000018 Down Up 16. AI669229 RARRES1 Homo sapiens cDNA clone 3q25.330.000036 Down Up IMAGE: 2315074 17. NM_006017 AC133, CD133 Prominin(mouse)-like 1 4p15.33 0.000036 Down Up 18. NM_004503 HOXC6 Homeo box C612q12-q13 0.000036 Up Down 19. NM_005084 PAFAH, LDL-PLA2 PhospholipaseA2, group VII 6p21.2-p12 0.000036 Up Down 20. NM_001511 MGSA, CXCL1,SCYB1 GRO1 oncogene 4q21 0.000071 Down Up 21. BG054844 ARHE H. sapienscDNA clone 2q23.3 0.000071 Down Up IMAGE: 3441573 22. NM_007191 WIF-1Wnt inhibitory factor-1 12q14.2 0.000071 Down Up 23. X99268 TWIST Twist(Drosophila) homolog 7p21.2 0.000071 Up Down 24. AI826799 EFEMP1EXTRACELLULAR PROTEIN S1-5 2p16 0.000071 Down Up PRECURSOR 25. NM_001018RPS15 Ribosomal protein S15 19p13.3 0.000071 Up Down 26. AV711904 LYZLysozyme (renal amyloidosis) 0.000071 Down Up 27. AI433463 MMENEPRILYSIN (HUMAN) 3q25.1-q25.2 0.000071 Down Up 28. BE908217 ANXA2 H.sapiens cDNA clone 15q21-q22 0.000071 Down Up IMAGE: 3902323 29.NM_000441 PDS, DFNB4 Solute carrier family 26, member 4 7q31 0.000071Down Up 30. BC003068 SLC19A1 Solute carrier family 19, member 1 21q22.30.000071 Up Down 31. NM_005950 MT1 Metallothionein 1G 16q13 0.000071Down Up 32. NM_013281 FLRT3 Fibronectin leucine rich transmembrane 20p110.000071 Down Up protein 3 33. AI351043 ESTs H. sapiens cDNA clone 210.000145 Up Down IMAGE: 1948310 34. NM_001099 PAP Acid phosphatase,prostate 3q21-q23 0.000145 Down Up 35. NM_006113 VAV3 Vav 3 oncogene1p13.1 0.000145 Down Up 36. NM_005980 S100P S100 calcium-binding proteinP 4p16 0.000145 Down Up 37. NM_000165 GJA1 Gap junction protein, alpha1, 43 kD 6q21-q23.2 0.000145 Down Up (connexin 43) 38. NM_003897 DIF2,IEX1, PRG1 Immediate early response 3 6p21.3 0.000145 Down Up 39.BC001388 ANX2, LIP2, CAL1H Annexin A2 15q21-q22 0.000145 Down Up 40.BC003070 HDR, MGC5445 GATA-binding protein 3 10p15 0.000145 Down Up 41.NM_020139 LOC56898 Oxidoreductase UCPA 4 0.000145 Down Up 42. AK002207KIAA0610 KIAA0610 protein 13 0.000145 Down Up 43. NM_000574 CR, TC, CD55Decay accelerating factor for 1q32 0.000145 Down Up complement 44.NM_006926 SP-A2, COLEC5 Surfactant, pulmonary-associated 10q22-q230.000145 Up Down protein A2 45. U37546 API2, MIHC, CIAP2 Baculoviral IAPrepeat-containing 3 11q22 0.000145 Down Up 46. AU148057 DKK3 H. sapienscDNA clone 11pter-p15.5 0.000145 Down Up MAMMA1002489 47. NM_002600DPDE4, PDEIVB Phosphodiesterase 4B, cAMP-specific 1p31 0.000145 Down Up48. S59049 BL34, IER1, IR20 Regulator of G-protein signalling 1 1q310.0003 Down Up 49. NM_001275 CGA, CgA Chromogranin A (parathyroidsecretory 14q32 0.0003 Down Up protein 1) 50. AL575509 ETS2 H. sapienscDNA clone CS0DI059YP21 = 21q22.2 0.0003 Down Up

TABLE 2 The first 50 genes from the supervised MDS analysis of tumorover benign (T/B) tissues ratio (Fold Change) of the high risk versusmoderate risk CaP patients, ranked by p-value.: (T/B Fold Change in HRvs MR) Genbank No accession Common Name of Genes Description of GenesMap p-Value 1. NM_004522 KINN, NKHC, NKHC2, Kinesin family member 5C2q23.3 0.00011 NKHC-2 2. J03198 GNAI3 Guanine nucleotide binding proteinG (K), alpha subunit 1p13 0.000981 3. NM_018010 HIPPI, FLJ10147Hypothetical protein FLJ10147 3q13.13 0.003257 4. NM_005479 FRAT1Frequently rearranged in advanced T-cell lymphomas 10q23.33 0.004964 5.NM_021795 SAP1 ELK4, ETS-domain protein (SRF accessory protein 1) 1q320.004964 6. NM_003113 LEU5, RFP2 Nuclear antigen Sp100 2q37.1 0.0049647. NM_002053 GBP1 Guanylate binding protein 1, interferon-inducible, 67kD 1p22.1 0.004964 8. AF064092 GSA, GSP, GPSA, GNAS1, Guanine nucleotideregulatory protein 20q13.2-q13.3 0.007579 9. BC003070 HDR, MGC5445GATA-binding protein 3 10p15 0.007579 10. NM_012245 SKIP, NCOA-62SKI-interacting protein 14q24.3 0.007579 11. NM_015895 LOC51053 Geminin6p22.2 0.007579 12. AA083478 TRIM22 Stimulated trans-acting factor (50kDa) 11 0.007579 13. NM_000100 PME, CST6, EPM1, STFB Cystatin B (stefinB) 21q22.3 0.007579 14. NM_003031 SIAH1 Seven in absentia (Drosophila)homolog 1 16q12 0.007579 15. NM_003407 TTP, GOS24, TIS11, Zinc fingerprotein 36, C3H type, homolog (mouse) 19q13.1 0.007579 NUP475 16.BF979419 ESTs ESTs, Highly similar to 60S ribosomal protein 13A 19q13.330.007579 [H. sapiens] 17. NM_021038 MBNL Muscleblind (Drosophila)-like3q25 0.007579 18. NM_014454 PA26 P53 regulated PA26 nuclear protein 6q210.007579 19. BC004399 DEME-6 DEME-6 protein 1p32.3 0.007579 20.NM_018490 LGR4 G protein-coupled receptor 48 11p14-p13 0.007579 21.NM_004328 BCS, BCS1, h-BCS, Hs.6719 BCS1 (yeast homolog)-like 2q330.007579 22. D87445 KIAA0256 KIAA0256 gene product 15 0.007579 23.NM_006326 NIFIE14 Homo sapiens seven transmembrane domain protein,19q13.12 0.007579 mRNA 24. D83077 TTC3 Tetratricopeptide repeat domain 3Xq13.1 0.007579 25. NM_006732 GOS3 FBJ murine osteosarcoma viraloncogene homolog B 19q13.32 0.007579 26. NM_003760 EIF4G3 Eukaryotictranslation initiation factor 4 gamma, 3 1pter-p36.13 0.007579 27.NM_004905 AOP2 Anti-oxidant protein 2 1q24.1 0.01159 28. NM_018439IMPACT Hypothetical protein IMPACT 18 0.01159 29. BC000629 DARSAspartyl-tRNA synthetase 2q21.2 0.01159 30. AK002064 DKFZP564A2416DKFZP564A2416 protein 2 0.01159 31. NM_013387 HSPC051Ubiquinol-cytochrome c reductase complex (7.2 kD) 22 0.01159 32.AA135522 KIAA0089 Homo sapiens KIAA0089 mRNA sequence. 3 0.01159 33.NM_015545 KIAA0632 KIAA0632 protein 7q22.1 0.01159 34. NM_005767 P2Y5Purinergic receptor (family A group 5) 13q14 0.01159 35. BC003682 G25K,CDC42Hs Cell division cycle 42 (GTP-binding protein, 25 kD) 1p36.10.01159 36. NM_005053 RAD23A RAD23 (S. cerevisiae) homolog A 19p13.20.017805 37. AI672541 IPW Human non-translated mRNA sequence. 15q11-q120.017805 38. AK023938 H. sapiens cDNA FLJ13876 SELECTED MODEL ORGANISMPROTEIN 2q37.3 0.017805 clone SIMILARITIES 39. NM_000062 C1IN, C1NH,C1-INH Serine (or cysteine) proteinase inhibitor, clade G (C111q12-q13.1 0.017805 inhibitor) 40. AA576961 PHLDA1 Pleckstrinhomology-like domain, family A, member 1 12q15 0.017805 41. AI796269NBS1, ATV, NIBRIN H. sapiens cDNA similar to Cell Cycle RegulatoryProtein 8q21-q24 0.017805 P95. 42. NM_000016 ACADM Acyl-Coenzyme Adehydrogenase, C-4 to C-12 straight 1p31 0.017805 chain 43. AI867102KIAA0906, NUP210, gp210 Nuclear pore membrane glycoprotein 2103p25.2-p25.1 0.017805 44. AI263909 ARHB, RHOB, RHOH6 Oncogene RHO6;Aplysia RAS-related homolog 6 2pter-p12 0.017805 45. NM_016021 NCUBE1Non-canonical ubquitin conjugating enzyme 1 6 0.017805 46. NM_012192TIM9B, TIM10B Fracture callus 1 (rat) homolog 11p15.5-p15.3 0.017805 47.NM_025087 FLJ21511 Hypothetical protein FLJ21511 4 0.017805 48.NM_014959 CARD8, CARDINAL, Tumor up-regulated CARD-containing antagonistof 19q13.33 0.017805 KIAA0955 caspase 9 49. AA923354 MAOA Monoamineoxidase A. Xp11.4-p11.3 0.017805 50. NM_021964 ZNF148 Zinc fingerprotein 148 (pHZ-52) 3q21 0.017805 51. NM_001674 ATF3 Activatingtranscription factor 3 1q32.3 0.017805

TABLE 3 The first 50 genes obtained from the supervised MDS analysis oftumor versus benign tissues of all the high risk CaP patients, ranked byp-value. (T vs N Intensities of 9 HR) Common Expression GenBank Name ofRegulation No. Accession Genes Description of Genes Map p-Value TumorBenign 1. U65585 HLA-DR1B Major histocompatibility complex, class II,6p21.3 0.00002 Down Up DR beta 1 2. NM_002053 GBP1 Guanylate bindingprotein 1, interferon-inducible, 1p22.1 0.000076 Down Up 3. NM_021983HLA-DRB4 Major histocompatibility complex, class II, 6 0.000076 Down UpDR beta 4 4. AI424243 CEGP1 Homo sapiens cDNA clone IMAGE: 2094442 110.000102 Down Up 5. NM_002343 LTF Lactotransferrin 3q21-q23 0.000138Down Up 6. NM_014575 SCHIP-1 Schwannomin-interacting protein 1 3q26.10.000257 Down Up 7. BC001169 ESD Esterase D/formylglutathione hydrolase13q14.1-q14.2 0.000357 Up Down 8. BF970427 UGCG UDP-glucose ceramideglucosyltransferase 9 0.000357 Down Up 9. NM_002275 KRT15 Keratin 1517q21 0.000495 Down Up 10. AU148057 DKK3 H. sapiens cDNA cloneMAMMA1002489 11pter-p15.5 0.000495 Down Up 11. AI922538 TMEM1Transmembrane protein 1 21 0.000689 Down Up 12. NM_004481 GALNAC-T2UDP-GalNAc transferase 2 1q41-q42 0.000689 Down Up 13. BC003070 HDR,GATA-binding protein 3 10p15 0.00097 Down Up MGC5445 14. BF979419 ESTs,similar H. sapiens 60S Ribosomal protein L13A 0.00097 Up Down to RPL13A15. BG054844 ARHE H. sapiens cDNA clone IMAGE: 3441573 2q23.3 0.00097Down Up 16. L42024 HLA-B Major histocompatibility complex, class I, B6p21.3 0.00138 Down Up 17. AL545982 CCT2 H. sapiens cDNA cloneCS0DI023YD15 12q15 0.001992 Up Down 18. NM_001993 TF, TFA, Coagulationfactor III (thromboplastin, tissue factor) 1p22-p21 0.001992 Up DownCD142 19. NM_004198 CHRNA6 Cholinergic receptor, nicotinic, alphapolypeptide 6 8p11.1 0.001992 Down Up 20. AV711904 LYZ Lysozyme (renalamyloidosis) 12q15 0.001992 Down Up 21. NM_013387 HSPC051Ubiquinol-cytochrome c reductase complex (7.2 kD) 22 0.001992 Up Down22. AW514210 HLA-F HLA CLASS I HISTOCOMPATIBILITY 6p21.3 0.001992 DownUp ANTIGEN, F A 23. NM_005032 PLS3 Plastin 3 (T isoform) Xq24 0.002894Down Up 24. NM_003407 TTP, GOS24, Zinc finger protein 36, C3H type,homolog (mouse) 19q13.1 0.002894 Down Up NUP475 25. NM_000165 GJA1 Gapjunction protein, alpha 1, 43 kD (connexin 43) 6q21-q23.2 0.002894 DownUp 26. AF275945 EVA1 Epithelial V-like antigen 1 11q23.3 0.002894 DownUp 27. NM_002450 MT1 Metallothionein 1L 16q13 0.002894 Down Up 28.NM_005950 MT1 Metallothionein 1G 16q13 0.002894 Down Up 29. NM_006994BTN3A3 Butyrophilin, subfamily 3, member A3 6p21.33 0.002894 Down Up 30.AI049962 KIAA0191 H. sapiens cDNA clone IMAGE: 1700970 1 0.002894 DownUp 31. X99268 TWIST Twist (Drosophila) homolog 7p21.2 0.002894 Up Down32. NM_016021 NCUBE1 Non-canonical ubquitin conjugating enzyme 1 60.002894 Up Down 33. NM_016205 SCDGF Platelet derived growth factor C4q32 0.002894 Up Down 34. AI681120 RANBP2 H. sapiens cDNA clone IMAGE:2272403 2q11-q13 0.004205 Up Down 35. NM_000574 CR, TC, CD55 Decayaccelerating factor for complement 1q32 0.004205 Down Up 36. NM_014937KIAA0966 Sac domain-containing inositol phosphatase 2 10q26.13 0.004205Down Up 37. NM_005213 STF1, STFA Cystatin A (stefin A) 3q21 0.004205Down Up 38. NM_005952 MT1 Metallothionein 1X 16q13 0.004205 Down Up 39.AF130095 FN1 Fibronectin 1 2q34 0.004205 Down Up 40. BE568219 PDE8A H.sapiens cDNA clone IMAGE: 3683966 15q25.1 0.004205 Up Down 41. D50925STK37, PAS-serine/threonine kinase 2q37.3 0.004205 Down Up PASKIN, 42.NM_006113 VAV3 Vav 3 oncogene 1p13.1 0.004205 Down Up 43. NM_001018RPS15 Ribosomal protein S15 19p13.3 0.006189 Up Down 44. NM_021038 MBNLMuscleblind (Drosophila)-like 3q25 0.006189 Down Up 45. NM_012323 U-MAFV-maf musculoaponeurotic fibrosarcoma, protein F 22q13.1 0.006189 DownUp 46. NM_005138 SCO1L SCO (cytochrome oxidase deficient, yeast)22q13.33 0.006189 Down Up homolog 2 47. AF186779 KIAA0959 RalGDS-likegene 1q25.2 0.006189 Down Up 48. D26054 FBP Fructose-1,6-bisphosphatase1 9q22.3 0.006189 Up Down 49. U37546 API2, MIHC, Baculoviral IAPrepeat-containing 3 11q22 0.006189 Down Up HIAP1 50. AB046845 SMURF1 E3ubiquitin ligase SMURF1 7q21.1-q31.1 0.006189 Down Up

TABLE 4 The first 50 genes obtained from the supervised MDS analysis oftumor versus benign tissues of all the moderate risk CaP patients,ranked by p-value: (T vs N Intensities of 9 MR) Expression GenbankCommon Name Regulation No. Accession of Genes Description of Genes Mapp-Value Tumor Benign 1. NM_014324 AMACR Alpha-methylacyl-CoA racemase5p13.2-q11.1 0 Up Down 2. NM_006457 ENH LIM protein (similar to ratprotein kinase C-binding 4q22 0.000009 Up Down enigma) 3. AI351043 ESTsH. sapiens cDNA clone IMAGE: 1948310 21 0.000011 Up Down 4. AI433463 MMEH. sapiens cDNA clone similar to NEPRILYSIN 3q25.1-q25.2 0.000028 DownUp (HUMAN) 5. BE256479 HSPD1 H. sapiens cDNA clone IMAGE: 335203112p13.31 0.000037 Up Down 6. NM_015900 PS-PLA1Phosphatidylserine-specific phospholipase A1alpha 3q13.13-q13.2 0.000083Up Down 7. NM_002343 LTF Lactotransferrin 3q21-q23 0.000083 Down Up 8.NM_001099 PAP Acid phosphatase, prostate 3q21-q23 0.000083 Down Up 9.T15991 CHRM3 IB2413 Infant brain, Bento Soares Homo sapiens cDNA1q41-q44 0.00011 Up Down 10. NM_005084 PAFAH Phospholipase A2, group VII6p21.2-p12 0.00011 Up Down 11. NM_004503 HOXC6 Homeo box C6 12q12-q130.00011 Up Down 12. N74607 AQP3 H. sapiens cDNA clone IMAGE: 296424 9p130.000149 Down Up 13. BC003068 SLC19A1 Solute carrier family 19 (folatetransporter), member 1 21q22.3 0.000149 Up Down 14. M21535 ERG(ets-related ERG v-ets erythroblastosis virus E26 oncogene like 21q22.30.000149 Up Down gene) (avian) 15. NM_013451 MYOF, Fer-1 (C.elegans)-like 3 (myoferlin) 10q24 0.0002 Down Up 16. NM_006017 AC133,CD133 Prominin (mouse)-like 1 4p15.33 0.0002 Down Up 17. BE550599CACNA1D H. sapiens cDNA clone IMAGE: 3220210 3p14.3 0.0002 Up Down 18.U22178 PSP57, PSP94 Microseminoprotein, beta- 10q11.2 0.0002 Down Up 19.NM_015865 JK, UT1, UTE Solute carrier family 14 (urea transporter),member 1 18q11-q12 0.000275 Down Up 20. NM_000441 PDS, DFNB4 Solutecarrier family 26, member 4 7q31 0.000275 Down Up 21. AA877789 MYO6 H.sapiens cDNA clone IMAGE: 1161091 6q13 0.000275 Up Down 22. AI356398ZFP36L2 H. sapiens cDNA clone IMAGE: 2028039 2 0.000275 Down Up 23.BC000915 CLIM1, CLP36 PDZ and LIM domain 1 (elfin) 10q22-q26.3 0.000275Down Up 24. NM_000286 PEX12 Peroxisomal biogenesis factor 12 17q11.20.000275 Up Down 25. NM_003671 CDC14B1, Homo sapiens CDC14 cell divisioncycle 14 homolog B 9q22.2-q22.31 0.000386 Down Up CDC14B2, (S.cerevisiae) (CDC14B), transcript variant 1, mRNA 26. NM_016545 SBB148Immediate early response 5 1q24.3 0.000386 Down Up 27. NM_002443 PSP57,PSP94 Microseminoprotein, beta- 10q11.2 0.000386 Down Up 28. NM_004999DFNA22 Myosin VI 6q13 0.000386 Up Down 29. X99268 TWIST Twist(Drosophila) homolog 7p21.2 0.000386 Up Down 30. NM_023009 MACMARCKSMacrophage myristoylated alanine-rich C kinase 1p34.3 0.000386 Up Downsubstrate 31. AI721219 TRAF3 as68b11.x1 Barstead colon HPLRB7 Homosapiens 14q32.33 0.000547 Down Up cDNA clone IMAGE: 2333853 3′, mRNAsequence. 32. NM_001584 D11S302E Chromosome 11 open reading frame 811p13 0.000547 Down Up 33. NM_018846 SBBI26 SBBI26 protein 7p15.30.000547 Up Down 34. M87771 BEK, KGFR, Fibroblast growth factor receptor2 10q26 0.000547 Down Up 35. AF275945 EVA1 Epithelial V-like antigen 111q23.3 0.000547 Down Up 36. AI791860 ESTs H. sapiens cDNA clone IMAGE:1011110 0.000547 Up Down 37. BC001282 NHC High-mobility group(nonhistone chromosomal) protein 6p21.3 0.000547 Down Up 17-like 3 38.NM_002015 FKH1, FKHR Forkhead box O1A (rhabdomyosarcoma) 13q14.10.000547 Down Up 39. X15306 NF-H H. sapiens NF-H gene, exon 1 (andjoined CDS). 22q12.2 0.000547 Down Up 40. BE965029 EST H. sapiens cDNAclone IMAGE: 3886131 11 0.000775 Up Down 41. NM_002275 KRT15 Keratin 1517q21 0.000775 Down Up 42. NM_001511 MGSA, CXCL1 GRO1 oncogene 4q210.000775 Down Up 43. NM_005213 STF1, STFA Cystatin A (stefin A) 3q210.000775 Down Up 44. NM_007191 WIF-1 Wnt inhibitory factor-1 12q14.20.000775 Down Up 45. H15129 MEIS3 EPIDERMAL GROWTH FACTOR-LIKE CRIPTO 170.000775 Down Up PROTEIN 46. AW452623 EST H. sapiens cDNA clone IMAGE:3068608 13 0.000775 Up Down 47. X90579 EST H. sapiens DNA for cyprelated pseudogene 7 0.000775 Down Up 48. BC001388 ANX2, Annexin A215q21-q22 0.001116 Down Up ANX2L4 49. NM_014863 BRAG, B cell RAGassociated protein 10q26 0.001116 Down Up 50. NM_021076 NEFHNeurofilament, heavy polypeptide (200 kD) 22q12.2 0.001116 Down Up

TABLE 5 Top 50 Upregulated Genes in All the 18 Samples (HR and MR)obtained from Tumor over Benign (T/B) ratio. Genbank T/N No ID RatioCommon Name of Genes Description Map 1. AF047020 39.86910 AMACRAlpha-methylacyl-CoA racemase 5p13.2-q11.1 2. M54886 20.86411 LOC51334Mesenchymal stem cell protein DSC54 5p13.1 3. AF070581 19.07263 ESTsHomo sapiens cDNA clone IMAGE: 1948310 21 4. NM_014324 18.04841 TRG@ Tcell receptor gamma locus 7p15-p14 5. NM_001669 15.98177 NPYNeuropeptide Y 7p15.1 6. NM_018360 13.34037 HOXC6 Homeo box C6 12q12-q137. AF092132 9.588665 IMPD2 IMP (inosine monophosphate) dehydrogenase 23p21.2 8. NM_023067 7.712272 HSPC028 HSPC028 protein 7p21.2 9. NM_0144397.031155 LTBP1 Latent transforming growth factor beta binding protein 12p22-p21 10. AI613045 6.739595 GMF Glia maturation factor, beta 14q22.111. AB051446 6.563991 DSC2 HUMAN Q02487 DESMOCOLLIN 2A/2B PRECURSOR18q12.1 12. NM_005342 6.442383 TRG, TCRG T cell receptor gamma locus7p15-p14 13. D87012 6.327042 PAWR H. sapiens cDNA clone IMAGE: 195086212q21 14. NM_018221 6.098105 SNX2 Sorting nexin 2 5q23 15. NM_0051145.769173 HS3ST1 Heparan sulfate (glucosamine)-3-O-sulfotransferase 1 1116. NM_022831 5.624385 RA70, SAPS, SKAP55R Src family associatedphosphoprotein 2 7p21-p15 17. NM_014324 5.621786 TRG, TCRG T cellreceptor gamma locus 7p15-p14 18. NM_006820 5.550019 BICD1 Bicaudal D(Drosophila) homolog 1 12p11.2-p11.1 19. NM_005574 5.454622 FOLH1 Folatehydrolase (prostate-specific membrane antigen) 1 11p11.2 20. AL3653435.451875 KIAA0615 Homo sapiens mRNA for KIAA0615 protein, complete cds.16q11.2 21. NM_022580 5.318270 TBCE Tubulin-specific chaperone e 1q42.322. AK022765 5.315669 CLDN8 Claudin 8 21 23. AF067173 5.272626 P21,NSG1, D4S234 Neuron-specific protein 4p16.3 24. NM_006220 5.180025 SHMT2Homo sapiens cDNA clone IMAGE: 2676158 12q12-q14 25. AL133600 5.146792ANK2 Homo sapiens cDNA clone by03a08 4q25-q27 26. AY009108 5.097967 PSMPROSTATE-SPECIFIC MEMBRANE ANTIGEN 2 (HUMAN) 27. AL035603 5.076761FLJ10907 Ribonuclease 6 precursor 6q27 28. NM_014017 5.058610 MAPBPIPMitogen-activated protein-binding protein-interacting 13 protein 29.BF247098 5.030722 PHLP, Phosducin-like 9q12-q13 DKFZp564M1863 30. U622964.992345 GOLPH2 Golgi phosphoprotein 2 9 31. AF130082 4.988912 EST Homosapiens clone FLC1492 PRO3121 mRNA, complete cds 32. NM_020373 4.969535C8orf4 Chromosome 8 open reading frame 4 8 33. U90030 4.873056 BICD1Bicaudal D homolog 1 (Drosophila) 6 34. NM_021071 4.821960 KIAA0426KIAA0426 gene product 6p22.2-p21.3 35. NM_030817 4.753895 KIAA1157KIAA1157 protein 12q13.3-q14.1 36. NM_019844 4.700642 HPRT, HGPRTHypoxanthine phosphoribosyltransferase 1 Xq26.1 37. NM_004721 4.689246RPL29 Ribosomal protein L29 3p21.3-p21.2 38. NM_004866 4.669274 EF2,EEF-2, Eukaryotic translation elongation factor 2 19pter-q12 39.NM_014501 4.610132 BGN Biglycan Xq28 40. NM_020655 4.575193 SDC2Syndecan 2 (heparan sulfate proteoglycan 1, fibroglycan) 8q22-q23 41.NM_006716 4.557526 ASK Activator of S phase kinase 19p13.11 42.NM_002968 4.541752 FOLH1 Folate hydrolase (prostate-specific membraneantigen) 1 11q14.3 43. X06268 4.539479 NCUBE1 Non-canonical ubquitinconjugating enzyme 1 6 44. AK021609 4.520464 PTH2, PTEN2, Phosphataseand tensin homolog (mutated in multiple 9p21 PSIPTEN advanced cancers1), pseudogene 1 45. NM_001133 4.479513 TCTEX1LT-complex-associated-testis-expressed 1-like Xp21 46. D38491 4.477160KIAA0461, POGZ, Pogo transposable element with ZNF domain, KIAA04611q21.2 protein 47. NM_006426 4.385531 DDX26 Deleted in cancer 1; RNAhelicase HDB/DICE1 13q14.12-q14.2 48. AW058148 4.347362 SPHAR S-phaseresponse (cyclin-related) 1q42.11-q42.3 49. U55209 4.293919 MYO7A myosinVIIA (Usher syndrome 1B) 4 50. NM_004610 4.275521 KIAA0634, ASTN2Astrotactin 2 9q33.1

TABLE 6 Top 35 Downregulated Genes in All the 18 Samples (HR and MR)obtained from Tumor over Benign (T/B) ratio. Genbank Common Name of theNo. ID T/N Ratio Genes Description Map 1. X90579 0.181138 CYP3A5Cytochrome P450, family 3, subfamily A, 7 polypeptide 5 2. NM_0052130.198502 STF1, STFA Cystatin A (stefin A) 3q21 3. NM_005864 0.254524EFS1, HEFS Signal transduction protein (SH3 containing) 14q11.2-q12 4.X15306 0.291665 NF-H H. sapiens NF-H gene, exon 1 (and joined CDS).22q12.2 5. BE908217 0.319347 ANXA2 Annexin A2 15q21-q22 6. BC0013880.320110 ANX2, LIP2, ANX2L4 Annexin A2 15q21-q22 7. U22178 0.326560PSP57, PSP94, PSP-94 Microseminoprotein, beta- 10q11.2 8. NM_0024430.338948 PSP57, PSP94, PSP-94 Microseminoprotein, beta- 10q11.2 9.NM_021076 0.359039 NEFH Neurofilament, heavy polypeptide (200 kD)22q12.2 10. AI433463 0.360636 MME, CD10, NEP, Neprilysin 3q25.1-q25.2CALLA 11. AF275945 0.366939 EVA1 Epithelial V-like antigen 1 11q23.3 12.NM_002343 0.370305 LTF Lactotransferrin 3q21-q23 13. NM_013451 0.378555MYOF, KIAA1207 Fer-1 (C. elegans)-like 3 (myoferlin) 10q24 14. NM_0015840.385272 239FB, D11S302E Chromosome 11 open reading frame 8 11p13 15.AL390736 0.391520 BA209J19.1, GW112 GW112(differentially expressed inhematopoietic lineages) 16. NM_000441 0.392117 PDS, DFNB4 Solute carrierfamily 26, member 4 7q31 17. AL031602 0.399115 ESTs ESTs 1p34.1-35.3 18.NM_004039 0.399796 ANXA2 Annexin A2 15q21-q22 19. NM_001546 0.402261 ID4DNA binding inhibitor protein of ID-4 6p22-p21 20. NM_001099 0.406234PAP Acid phosphatase, prostate 3q21-q23 21. X57348 0.422692 9112 H.sapiens mRNA (clone 9112). 1p35.2 22. NM_020139 0.440648 LOC56898Oxidoreductase UCPA 4 23. AU148057 0.444528 DKK3, REIC Dickkopf relatedprotein-3 precursor (Dkk-3) 11pter-p15.5 (Dickkopf-3) (hDkk-3) 24.BF059159 0.446108 ROBO1, DUTT1, SAX3 Roundabout, axon guidance receptor,homolog 1 3p12 (Drosophila) 25. BC001120 0.448109 MAC2, GALBP, MAC-2,Lectin, galactoside-binding, soluble, 3 (galectin 3) 14q21-q22 26.N74607 0.451123 AQP3 Aquqporin 3 9p13 27. NM_013281 0.454835 FLRT3Fibronectin leucine rich transmembrane protein 3 20p11 28. NM_0007000.456566 ANX1, LPC1 Annexin A1 9q12-q21.2 29. X57348 0.458169 9112 H.sapiens mRNA (clone 9112). 1p35.2 30. AI356398 0.467028 ZFP36L2, ERF-2,TIS11D EGF-respons factor 2 2 31. AF016266 0.467787 DR5, TRAILR2, Tumornecrosis factor receptor superfamily, member 8p22-p21 TRICK2A, 10b 32.S59049 0.467913 BL34, IER1, IR20 Regulator of G-protein signalling 11q31 33. NM_000165 0.470393 GJA1 Gap junction protein, alpha 1, 43 kD(connexin 43) 6q21-q23.2 34. AI826799 0.471081 EFEMP1, DRAD, FBNLEGF-CONTAINING FIBULIN-LIKE EXTRACELLULAR 2p16 MATRIX PROTEIN 1 35.AL575509 0.476538 ETS2 V-ets erythroblastosis virus E26 oncogene homolog2 21q22.2 (avian)Classification between Tumor and Benign Prostate Epithelium:

A class prediction analysis using distance based Multi DimensionalScaling (MDS) was used to determine expression differences between tumorand benign epithelial cells in 18 patients with radical prostatectomy.All the genes that meet a minimum level of expression were included inthe analysis. We used the normalized intensities of all the 18 tumor and18 normal samples for a class prediction analysis by distance based MDSto determine differentiation between tumor and benign tissue specificgene expression profile among all the 18 patients. Using a matrix ofPearson correlation coefficients from the complete pair-wise comparisonof all the experiments we observed a significant overall difference ingene expression pattern between the tumor and benign tissue as displayedas a two-dimensional MDS plot in FIG. 2A. The position of the each tumorand benign samples is displayed in the MDS plot in two-dimensionalEuclidean space with the distance among the samples reflectingcorrelation among the samples in each individual group (distance withinthe cluster) and as well as reflecting distinct separation between thetwo groups (center-to center distance) (FIG. 2A). The MDS plot wasobtained from the top 200 genes obtained by 10,000 permutations of thetumor and benign intensities of 4566 genes. Out of these 200 genes thatdefine the tumor specific alteration of gene expression, 53 genes hadhigher expression in the tumor samples and the remaining 147 genes hadhigher expression in the benign samples. A partial list of genes thatdistinctly discriminate the tumor and benign samples from all the 18patients is shown in Table 1. We also performed a hierarchicalclustering analysis using the 200 discriminatory genes. The hierarchicalclustering algorithm resulted in a hierarchical dendrogram thatidentified two major distinct clusters of 16 tumor samples and 17 benignsamples (FIG. 2B).

Classification of Cap into HR and MR Groups Using the Ratio of TumorOver Benign Gene Expression Intensities

We used the tumor over benign gene expression intensity ratio (T/Bratio) (FIG. 3A) from the HR (9 patients) and MR (9 patients) groups fora class prediction analysis using distance based MDS method to determineif the 18 patients can be differentiated into the two patient groups.Pathological and clinical features of the 18 tumors used in our studywere clearly distinguishable between the HR and MR groups. We observed asignificant overall difference in expression pattern between the HR andMR groups. The distance between the samples reflects both the extent ofcorrelation within each individual selected group (distance within thecluster) as well as distinct separation between the two selected groups(center-to-center distance) (FIG. 3A). The MDS plot obtained from top200 genes by 10,000 permutations of the 4868 genes based on the T/Bratio is shown in FIG. 3A. Out of the top 200 genes of the MDS analysis135 were over expressed in the HR group and 65 genes were over expressedin the MR group, The top 50 genes with best p-values identified by theT/B ratio based MDS analysis discriminating the HR and MR groups arelisted in FIG. 3B. The approach we used for the interpretation ofdiscrimination between the HR and MR groups was empirical. The ‘weightedlist’ (FIG. 3B) of individual genes whose variance of change across allthe tumor samples defines the boundary of a given cluster to predict aclass that correlates with the pathological and clinical features ofCaP. We also performed a hierarchical clustering to verify the resultsof the MDS analysis and also to test the potential of those 200 genes topredict class/group (HR and or MR) using another approach of analysis.The resulting hierarchical dendrogram of TB ratio demonstrates that 9samples of the HR group formed a very distinct and tight cluster, as didthe 9 samples of MR group (FIG. 3B).

Classification of CaP into HR and MR groups based on Gene ExpressionIntensities in Tumor Cells

MDS analysis was used to determine differentiation among 18 patientsinto HR and MR groups. An overall difference in tumor specificexpression between the HR and MR groups is displayed as atwo-dimensional MDS plot (FIG. 3C). The MDS plot obtained from 10,000permutations of the gene expression intensities of 4115 genes from thetumor samples of 18 patients differentiated them into HR and MR groupsbased on the selected top 200 genes (FIG. 3C). Out of this 200 genes, 94had higher expression in the HR groups and the remaining 106 genes hadhigher expression in the MR groups. We performed a hierarchicalclustering analysis using the 200 discriminatory genes obtained from thesupervised MDS analysis. The resulting hierarchical dendrogram of 18tumor samples demonstrates that 9 tumor samples of the HR group and 9tumor samples of the MR group were separated into two tight clusters.(FIG. 3D). The approach we utilized on the basis of the linearcorrelation of global gene expression in FIG. 3 to obtain ‘gene cluster’interpretation to discriminate the HR and MR groups was empirical. Genesthat discriminate the HR and MR groups are shown in Table 7.

TABLE 7 Top 17 genes analysis based on T/B fold change of HR vs MRgroups Gene Bank ID Common Name Description Map p-Value HR MR AbsentPositive 1 NM_004522 KINN, NKHC Kinesin family member 5C 2q23.3 0.0001Up Down 4 60% 3 NM_018010 HIPPI, FLJ10 Hypothetical protein FLJ101473q13.13 0.0033 Up Down 0 56% 10 NM_012245 SKIP, NCOA- SKI-interactingprotein 14q24.3 0.0076 Up Down 2 42.80%   11 NM_015895 LOC51053 Geminin6p22.2 0.0076 Up Down 2 71.40%   14 NM_003031 SIAH1 Seven in absentia(Drosophila) homolog 1 16q12 0.0076 Up Down 3 66% 42 NM_000016 ACADMAcyl-Coenzyme A dehydrogenase, C-4 to C-12 1p31 0.0178 Up Down 1 75%straight 47 NM_025087 FLJ21511 Hypothetical protein FLJ21511 4 0.0178 UpDown 1 50% 17 NM_021038 MBNL Muscleblind (Drosophila)-like 3q25 0.0076Down Up 2 71 25 NM_006732 GOS3 FBJ murine osteosarcoma viral oncogene19q13.32 0.0076 Down Up 3 83% homolog B 51 NM_001674 ATF3 Activatingtranscription factor 3 1q32.3 0.0178 Down Up 0 100%  7 NM_002053 GBP1Guanylate binding protein 1, interferon-inducible, 1p22.1 0.005 Down Up4 83% 67 KD 15 NM_003407 TTP, GOS24 Zinc finger protein 36, C3H type,19q13.1 0.0076 Down Up 1 62% homolog (mouse) 26 NM_003760 EIF4G3Eukaryotic translation initiation factor 4 1pter-p3 0.0076 Down Up 4 40%gamma, 3 38 AK023938 Homo sapien SELECTED MODEL ORGANISM PROTEIN 2q37.30.0178 Down Up 4 80% SIMILARITIES 45 NM_016021 NCUBE1 Non-canonicalubquitin conjugating enzyme 1 6 0.0178 Down Up ? 3 66% 5 NM_021795 SAP1ELK4, ETS-domain protein (SRF accessory 1q32 0.005 Up Down 4 80%protein 1) 18 NM_014454 PA26 P53 regulated PA26 nuclear protein 6q210.0076 Up Down 3 83%Classification of CaP into High Risk and Medium Risk Groups Based onGene Expression Intensities in Benign Prostate Epithelial

We used a similar MDS and Cluster analysis as in the tumor versus tumorsample gene expression intensities for the normalized intensities of 9benign samples of HR group and 9 benign samples of MR group for a classprediction. Strikingly the MDS plot of the benign samples depicteddistinct separation between the HR and MR groups (FIG. 3E). We observeda significant overall difference in expression pattern between the HRand MR groups. The MDS plot obtained from the top 200 genes by 10,000permutations of the 3358 genes from the benign versus benign intensities(FIG. 3E). Out of this 200 genes 61 were over expressed in benignsamples of the HR groups and the remaining 139 genes were over expressedin the MR groups. The ‘weighted list’ of individual genes whose varianceof expression alteration across all the normal samples depicts thecapability of a given cluster to predict classification. Thehierarchical clustering algorithm identified a similar major cluster ofthe 9 benign samples of the HR group and a cluster of 9 benign samplesof the MR group.

The weighted gene analysis by distance based supervised multidimensionalscaling method we used, (depicted in FIGS. 3A, 3C, and 3E) utilizing thegene expression ratio of tumor and benign intensities, gene expressionintensities of tumor samples and as well as normal for obtaining a‘weighted list’ of individual genes, whose variance of change across allthe tumor and benign samples distinctly delineate the boundary of agiven cluster, to predict a class that correlates with the pathologicaland clinical features of CaP.

Independent in Silico Cross Validation

In silico analysis for the predicted classifier was carried out usingtwo independent data sets. The HR and MR groups were selected on thebasis of Gleason score as that was the only criterion available forthese data. At least 200 genes were extracted from all the MDS analysis(see methods for detail description). This subset of 200 classifiergenes were found in the data of Welsh et al. 2001 and Sing et al. 2002.Exactly similar MDS analysis (p<0.001 as measured by 10,000 permutationtesting) as described above was performed using the expressionintensities of these 200 genes from Welsh and Singh data. MDS analysisusing tumor over benign ratio of as low as 50 genes from the subset of200 genes from Welsh data (FIG. 4A) as well as Singh data (FIG. 4B)clearly separated samples from HR group and samples from MR group. Thus,this observation elucidates that the differential expression profile ofthis small set of genes can be used to predict the identity or class orgroup of unknown prostate cancer samples on the basis of theirclinico-pathological features. The outcome of this analysis depicts thatthe expression profile of this small number of genes is conserved acrossthe independent data sets.

Validation of GeneChip Results by Real-Time PCR

To further validate the expression alterations of genes identified byGeneChip analysis with an indicated biological relevance to prostatecancer, primers and probes were obtained for real-time PCR analysisusing AMACR and GSTP1. These genes were chosen for validation purposesbecause it has been reported previously by several investigators thatAMACR is elevated and GSTP1 decreased in CaP. Each sample demonstrated aunique pattern of down-regulation of GSTP1 gene in 18 of 20 samples aswell as up-regulation of AMACR (FIG. 1) the other two samples did showsignificant change (fold change less than 1.5).

One ng of total RNA samples from paired tumor and normal specimens wasreverse-transcripted using Omnisensecript RT-kit (Qiagene, Valencia,Calif.) according to the manufacturer's protocol.

Quantitative gene expression analysis was performed using TaqMan MasterMix Reagent and an ABI prism 7700 Sequence Detection System (PE AppliedBiosystems Foster, Calif.). All sets of primer and probe for testedgenes were Assays-on-Demand Gene expression products obtained from PEApplied Biosystems. The expression of house keeping gene, GAPDH wassimultaneously analyzed as the endogenous control of same batch of cDNA,and the target gene expression of each sample was normalized to GAPDH.For each PCR run, a master-mix was prepared on ice with 1×TaqMan MasterMix, 1× target gene primer/probe and 1×GAPDH primer/probe. Twomicroliters of each diluted cDNA sample was added to 280 of PCRmaster-mix. The thermal cycling conditions comprised an initialdenaturation step at 95° C. for 10 minutes and 50 cycles at 95° C. for15 seconds and 60° C. for 1 minute. RNA samples without reversetranscription were included as the negative control in each assay. Allassays were performed in duplicate. Results were plotted as averageC_(T) (threshold cycle) of duplicated samples. The relative geneexpression level was presented as “Fold Change” of tumor versus matchednormal cells, which is calculated as: Foldchange=2^((ΔCT normal-ΔCTtumor)), where ΔC_(T) means normalized C_(T)value of target genes to GAPDH.

Example 3 Distinguishing Between ERG1 and ERG2 Isoforms

The Affymetrix GeneChip probe set (213541_s_at) and TaqMan probes usedin the experiments described above recognize a region specific for bothERG1 and ERG2 isoforms (FIG. 6), but exclude isoforms 3 to 9. Althoughother primers and probes could be used, by way of example, TaqManprimers and probe recognizing both ERG1 and ERG2, but not other ERGisoforms were as follows:

Fwd primer: (SEQ ID NO: 7) 5′- AGAGAAACATTCAGGACCTCATCATTATG -3′Reverse primer: (SEQ ID NO: 8) 5′- GCAGCCAAGAAGGCCATCT -3′ Probe:(SEQ ID NO: 9) 5′-TTGTTCTCCACAGGGT-3′The probe has the reporter dye, 6-FAM, attached to the 5′ end and TAMRAattached to the 3′ end. The 3′-TAMRA effectively blocks extension duringPCR.

To further distinguish between these two ERG isoforms, the expression ofthe ERG1 and ERG2 isoforms were tested in PC3 cells and in normalprostate tissue (pooled prostate RNA from 20 men, Clontech), as well asin microdissected tumor and normal prostate epithelial cells from 5 CaPpatients (data not shown). Only ERG1 was expressed in the prostate cellsand in PC3 cells. ERG2 expression was not detectable. A TaqMan QRT-PCRprobe and primers were designed that specifically recognize only theERG1 isoform (FIG. 6). Although other primers and probes could be used,by way of example, we designed TaqMan primers and probes recognizingonly the ERG1 isoform as follows:

Forward primer: (SEQ ID NO: 10) 5′-CAGGTCCTTCTTGCCTCCC-3′Reverse primer: (SEQ ID NO: 11) 5′-TATGGAGGCTCCAATTGAAACC-3′ Probe:(SEQ ID NO: 12) 5′-TGTCTTTTATTTCTAGCCCCTTTTGGAACAGGA-3′.The probe has the reporter dye, 6-FAM, attached to the 5′ end and TAMRAattached to the 3′ end. The 3′-TAMRA effectively blocks extension duringPCR.

ERG1 expression was determined in 228 RNA specimens from microdissectedmatched tumor and benign prostate epithelial cells of 114 CaP patients.Overall, 62.4% of the 114 CaP patients analyzed had significant overexpression of ERG1 isoform in their tumor cells (i.e., greater than 2fold ERG1 expression in tumor versus benign cells), while 16.6% of CaPpatients had no detectable ERG1 expression, 15.0% had under expressionof ERG1 (less than 0.5 fold difference in ERG1 expression in tumorversus benign cells), and 6.0% had no significant difference (0.5 to 2fold difference in ERG1 expression between tumor versus benign cells).

In a further study, ERG expression was analyzed in 82 CaP patients.Using the TaqMan primers and probes discussed above, we observedtumor-associated over expression of ERG1 (isoform 1 only) and ERG(isoforms 1 and 2) in 63.4% and 72.0% of the patients, respectively.Therefore, ERG1 isoform specific expression may actually reflect anunderestimate of the overall ERG expression in CaP.

Example 4 Correlation of ERG1 Expression with Various Clinico-PathologicFeatures

Since the ERG1 tumor versus benign expression ratio data did not havenormal distribution, the Wilcoxon Rank Sum Test was used to analyze itsrelationship with various clinico-pathologic features, as shown in Table8.

TABLE 8 Relationship of ERG1 expression ratios in tumor versus benignprostate epithelial cells with patient clinical factors Mean scores ofClinical Median of ERG1 ERG1 fold factors N fold changes changes P PSArecurrence No 75 142.2 52.19 0.0042 Yes 20 1.2 32.30 Tumor 0.0020Differentiation Well & 40 362.3 57.62 Moderate Poor 54 13.9 40.00Pathologic T 0.0136 stage pT2 38 502.0 53.45 pT3-4 52 33.5 39.69 Marginstatus 0.0209 Negative 64 197.0 52.55 Positive 31 20.4 38.61 Seminal0.2555 vesicle Negative 82 106.7 49.28 Positive 13 6.9 39.92 Race 0.0086Caucasian 73 172.1 52.08 African 22 3.8 34.45 American Family history0.3887 No 70 106.7 49.46 Yes 25 4.8 43.92 Diagnostic PSA (ng/ml) <=4 13101.3 57.15 0.1801 >4-10 62 112.1 48.03 >10 19 20.5 39.16 Gleason sum0.2923 <7 33 112.0 52.06 =7 45 118.1 47.16 >7 16 21.0 39.06

As shown in Table 8, 95 CaP patients with detectable ERG1 expressionwere analyzed by Wilcoxonr ank sum test. N represents the number of CaPpatients falling into the indicated clinical factor category.Significant p values (<0.05) are in bold face.

We also found a significant correlation of high ERG1 over expressionwith Caucasian over African American ethnicity (p=0.0086) (Table 8). Tofurther explore the correlation with PSA recurrence, Kaplan-Meiersurvival analysis was performed based on three patient groups: 1) CaPpatients with tumor versus benign ERG1 expression ratio of less than 2fold; 2) CaP patients with tumor versus benign ERG1 expression ratio of2-100 fold; and 3) CaP patients with tumor versus benign ERG1 expressionratio of greater than 100 fold (FIG. 7). The results show that patientswith higher ERG1 over expression in their prostate tumor tissue hadsignificantly longer PSA recurrence-free survival (log rank test,P=0.0006) (FIG. 7). The 36-months PSA recurrence-free survival forpatients with less than 2 fold ERG1 expression ratio (n=24) was 54.4%,while for patients with greater than 100 fold ERG1 expression ratio(n=47) it was 87.7%. From a univariate COX proportional hazard ratioregression analysis for PSA recurrence-free time using ERG1 tumor/benigncells expression ratio, race, diagnostic PSA, Gleason sum, pathologic Tstage, margin status, and seminal vesicle invasion status, we found thatfive of these variables (ERG1 tumor/benign cells expression ratio,Gleason sum, pathologic T stage, margin status, seminal vesicleinvasion) had a significant p value (Table 9).

TABLE 9 Correlation of clinical parameters and ERG1 expression ratios intumor versus benign prostate epithelial cells with PSA recurrence-freetime after radical prostatectomy Factors Crude Hazard Ratio (95% CI) PERG1 fold changes 0.0024 2-100 fold vs. <2 fold 0.291 (0.093-0.915)0.0347 >100 fold vs. <2 fold 0.173 (0.060-0.498) 0.0011 Race Caucasianvs. African American 1.092 (0.395-3.016) 0.8657 Diagnostic PSA0.8723 >4-10 vs. <=4 0.976 (0.275-3.468) 0.9705 >10 vs. <=4 1.285(0.307-5.378) 0.7313 Gleason Sum 0.0001 7 vs. 2-6 1.574 (0.393-6.296)0.5215 8-10 vs. 2-6 9.899 (2.752-35.610) 0.0004 Pathologic T stage pT3/4vs. pT2 6.572 (1.517-28.461) 0.0118 Margin status Positive vs. Negative2.825 (1.169-6.826) 0.0210 Seminal Vesicle Positive vs. Negative 3.792(1.487-9.672) 0.0053

In Table 9, crude hazard ratios with 95% confidence interval are shownfor ERG1 fold change (tumor versus benign) and six clinical parametercategories in a univariate COX proportional hazard ratio analysis.Significant p values are in bold face. The multivariate COX proportionalhazard ratio regression analysis of the significant variables from theunivariate analysis shows that ERG1 overexpression (greater than 100fold vs. less than 2 fold: p=0.0239, RR=0.274, overall p value 0.0369),and Gleason sum (Gleason 8-10 vs. Gleason 2-6: p=0.0478, RR=4.078,overall p value 0.0148) are independent predictors of PSA recurrenceafter radical prostatectomy (Table 10). These results demonstrate thatthe status of ERG1 expression ratios (tumor vs. benign) in radicalprostatectomy specimens carries a predictive value for patientprognosis.

TABLE 10 Factors Crude Hazard Ratio (95% CI) P ERG1 fold changes 0.03692-100 fold versus <2 fold 0.320 (0.097-1.059) 0.0620 >100 fold versus <2fold 0.274 (0.089-0.843) 0.0239 Gleason Sum 0.0148 7 versus 2-6 0.948(0.223-4.033) 0.9424 8-10 versus 2-6 4.078 (1.014-16.401) 0.0478Pathologic T stage PT3/4 versus pT2 3.306 (0.636-17.177) 0.1550 Marginstatus Positive versus Negative 1.116 (0.421-2.959) 0.8254 SeminalVesicle Positive versus Negative 1.308 (0.466-3.670) 0.6098

ERG1 expression in prostate tumor tissue showed highly significantassociation with longer PSA recurrence free survival (p=0.0042), welland moderately differentiated grade (p=0.0020), lower pathologic T stage(p=0.0136), and negative surgical margin status (p=0.0209), suggestingthat ERG1 over expression in tumor cells is generally higher in lessaggressive CaP than in more aggressive CaP (Table 8).

The ERG1 over expression in tumor cells identified by GeneChip analysisand verified by real time QRT-PCR assays was further validated by insitu hybridization. Based on the real time QRT-PCR data, 6 patients withhigh ERG1 over expression in their tumor cells (and as a control onepatient with no ERG1 over expression) were selected for in situhybridization and quantitative image analysis in a blinded fashion. Asexpected, in each case the in situ expression data confirmed the overexpression of ERG1 in the tumor epithelial cells (FIG. 8).

Example 5 Generation and Characterization of ERG Antibody

Cloning of ERG1 into Tetracycline regulated mammalian expressionvectors:

ERG1 cDNA was subcloned into tetracycline-regulated mammalian expressionvectors (pTet-off, EC1214A). The constructs generated include,pTet-off-ERG1 (sense), pTet-off-ERG1 (antisense), pTet-off-FlagERG1(sense) and pTet-off-FlagERG1 (antisense). Originally, ERG1 construct ina riboprobe vector pGEM was obtained from Dr. Dennis K. Watson, MedicalUniversity of South Carolina. The constructs were verified by dideoxysequencing and agarose gel analysis.

Generation of polyclonal ERG antibody:

Antibodies against ERG were generated using peptide antigens derivedfrom the full length ERG1 coding sequence. The epitope for the antigenwere carefully selected such that the antibody recognizes specificallyERG1/2/3 and not other members of the ETS family (FIG. 9). The followingpeptides, having the highest hydrophilicity (−1.26 and −0.55) andantigenicity in the desired region, were used to generate antibodies:

Peptide M-50-mer: (SEQ ID NO: 13)CKALQNSPRLMHARNTDLPYEPPRRSAWTGHGHPTPQSKAAQPSPS TVPK-[NH₂]Peptide C-49-mer: (SEQ ID NO: 14)CDFHGIAQALQPHPPESSLYKYPSDLPYMGSYHAHPQKMNFVAPHPPAL

Cysteine was added to each peptide for conjugation. Peptide M isamidated at the C-terminal residue because it is an internal pepetide.

The synthesis of the peptide epitopes and the immunization of rabbitswere carried out in collaboration with Bio-Synthesis Inc. Two rabbitswere immunized for each of the two epitopes. Bleeds obtained postimmunization were collected and tested. Subsequently, bleeds from one ofthe rabbits from each epitope were affinity purified using SulfoLink kit(Pierce) and were verified by immunoblot analysis.

Characterization of polyclonal ERG antibody by immunoblot analysis:

To characterize the affinity purified antibody, we transientlytransfected HEK-293 (Human embryonic kidney cell line, ATCC, Manassas,Va.) with ERG1 constructs pTet-off-ERG1 (sense) and pTet-off-FlagERG1(sense) using Lipofectamine reagent (Invitrogen, Carlsbad, Calif.) asper manufacturers instructions. HEK-293 that were not transfected withthe plasmid served as a transfection control. The cells were harvested48 hours post-transfection and processed for immunoblot analysis.Expression of ERG1 following transfection was determined byimmunoblotting using the affinity purified polyclonal antisera generatedagainst the unique M- and C-ERG epitopes described above. EndogenousERG1 expression was not detected in non-transfected HEK-293 cells.However, the ERG antibodies detected ERG1 expression in HEK-293 cellstransfected with the various ERG1 constructs. Tetracycline (2 ug/ml)abolished ERG1 expression in both tetracycline-regulated constructs,pTet-off-ERG/(sense) and pTet-off-FlagERG1 (sense). The M2-Flag antibodyspecifically recognized only the Flag-tagged ERG1 protein.

Example 6 Combined Expression of ERG, AMACR, and DD3 Genes in ProstateTumors

The strikingly high frequency of ERG over expression in CaP cells led toa comparison of ERG expression with two other genes, AMACR and DD3, thatare also over expressed in CaP cells. We have evaluated quantitativegene expression features of AMACR and DD3, along with the ERG gene, inlaser microdissected matched tumor and benign prostate epithelial cellsfrom 55 CaP patients.

Although other primers and probes can be used, by way of example, wedesigned the following TaqMan primers and probe recognizing the DD3gene:

Forward primer: (SEQ ID NO: 15) 5′-CACATTTCCAGCCCCTTTAAATA-3′Reverse primer: (SEQ ID NO: 16) 5′-GGGCGAGGCTCATCGAT-3′ Probe:(SEQ ID NO: 17) 5′-GGAAGCACAGAGATCCCTGGGAGAAATG-3′.The probe has the reporter dye, 6-FAM, attached to the 5′ end and TAMRAattached to the 3′ end. The 3′-TAMRA effectively blocks extension duringPCR.

AMACR TaqMan primers and probe were purchased from Applied Biosystems.

AMACR and DD3 showed upregulation in tumor cells of 78.2% and 87.3% ofCaP patients, respectively (FIG. 5). ERG over expression in tumor cellswas detected in 78.2% of the same group of CaP patients (FIG. 5).Comparative expression analysis revealed that when the AMACR and ERGexpression data are combined, 96.4% of the CaP patients showedupregulation of either of the two genes in tumor cells (FIG. 5).Similarly, the combination of the ERG and DD3 expression data improvedthe cancer detection power of either of the genes to 96.4% (FIG. 5).When combining the expression data from all the three genes, 98.2% ofthe CaP patients showed upregulation of at least one of the three genesin tumor cells (FIG. 5). Thus, screening for ERG gene expression, alone,or in combination with other genes that are over expressed in CaP, suchas AMACR and DD3, provides a new, powerful diagnostic and prognostictool for CaP.

Example 7 Under Expression of LTF in Malignant Prostate Epithelium

One of the most consistently under expressed genes in CaP cells was LTF(Table 1). Validation by QRT-PCR (TaqMan) in LCM-derived tumor andbenign prostate epithelial cells confirmed a consistent, tumorassociated LTF under expression in 100% of CaP cells tested (FIG. 1D).As a quality control, the expression of AMACR, a recently identified CaPtissue marker, and of GSTP1, a gene showing commonly reduced or absentexpression in CaP (Nelson et al., Ann. N.Y. Acad. Sci., 952:135-44(2001)), was also determined (FIGS. 1B and 1C, respectively). Robustunder expression similar to LTF, was observed for GSTP1, while theincreased expression of AMACR was noted in 95% of the tumor cellstested, confirming the high quality of the tumor and benign LCMspecimens and the reliability of the QRT-PCR. In a further study, LTFexpression was analyzed by QRT-PCR in in microdissected tumor and benignprostate epithelial cells of 103 CaP patients. The results wereconsistent with the initial results, showing tumor associated underexpression in 76% of patients (78 of 103).

LTF under expression was also validated at the protein level withanti-LTF goat polyclonal antibody (Santa Cruz, Calif., sc-14434) usingWestern blot analysis on protein lysates and immunohistochemistrytechniques. Hematotoxylin-eosin (H&E) and LTF staining was performed ontissue samples from 30 CaP patients by immunocytochemical analysis. In30 of 30 (100%) cases, benign epithelial cells adjacent to tumor cellswere highly positive for LTF, whereas, on average, less than 10% ofprostate tumor cells revealed LTF positive cytoplasmic staining.

The specification is most thoroughly understood in light of theteachings of the references cited within the specification which arehereby incorporated by reference. The embodiments within thespecification provide an illustration of embodiments of the inventionand should not be construed to limit the scope of the invention. Theskilled artisan readily recognizes that many other embodiments areencompassed by the invention.

1.-33. (canceled)
 34. A method of diagnosing or prognosing prostatecancer in a subject, comprising: a) measuring the expression level of anERG nucleic acid in a biological sample from the subject; and b)correlating the expression level of the ERG nucleic acid with thepresence of prostate cancer in the subject or a higher predisposition ofthe subject to develop prostate cancer.
 35. The method of claim 34,wherein the ERG nucleic acid is RNA and the expression level is measuredby nucleic acid amplification.
 36. The method of claim 34 or 35, whereinthe ERG nucleic acid is ERG1 (SEQ ID NO: 1) or ERG2 (SEQ ID NO:2). 37.The method of claim 34 or 35, further comprising measuring theexpression levels of an AMACR nucleic acid and correlating theexpression levels of the ERG nucleic acid and the AMACR nucleic acidwith the presence of prostate cancer in the subject or a higherpredisposition of the subject to develop prostate cancer.
 38. The methodof claim 37, wherein the AMACR nucleic acid is SEQ ID NO.
 3. 39. Themethod of claim 34, 35, or 37, wherein the biological sample is chosenfrom a tissue sample, a blood sample, or a urine sample.
 40. The methodof claim 39, wherein the biological sample is a prostate tissue sample.41. The method of claim 39, wherein the biological sample is a urinesample.
 42. The method of claim 34 or 35, wherein the expression levelof the ERG nucleic acid in the biological sample is compared to theexpression level of the ERG nucleic acid in a control sample and whereinan increased expression of the ERG nucleic acid in the biological sampleof at least two times compared to the expression of the ERG nucleic acidin the control sample indicates the presence of prostate cancer in thesubject or a higher predisposition of the subject to develop prostatecancer.
 43. The method of claim 42, wherein the control sample is anoncancerous biological sample from the subject.
 44. The method of claim37, wherein the expression levels of the ERG nucleic acid and the AMACRnucleic acid in the biological sample is compared to the expressionlevels of the ERG nucleic acid and the AMACR nucleic acid in a controlsample and wherein an increased expression of the ERG nucleic acid andthe AMACR nucleic acid in the biological sample of at least two timescompared to the expression of the ERG nucleic acid and the AMACR nucleicacid in the control sample indicates the presence of prostate cancer inthe subject or a higher predisposition of the subject to developprostate cancer.
 45. The method of claim 44, wherein the control sampleis a noncancerous biological sample from the subject.
 46. The method ofclaim 34, 35, or 37, wherein the expression level of the ERG nucleicacid is used to indicate or predict the pathologic stage of prostatecancer.
 47. The method of claim 34, 35, or 37, wherein the expressionlevel of the ERG nucleic acid is correlated with longer PSA recurrencefree survival, well and moderate tumor differentiation, a pathologic Tstage of pT2 or lower, or a negative surgical margin status.
 48. Themethod of claim 34, 35, or 37, wherein the expression level of the ERGnucleic acid is correlated with well and moderate tumor differentiation.49. The method of claim 34, 35, or 37, wherein the expression level ofthe ERG nucleic acid is correlated with longer PSA recurrence freesurvival.